You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2021/05/17 08:52:22 UTC
[GitHub] [hudi] PavelPetukhov opened a new issue #2959: No data stored after migrating to Hudi 0.8.0
PavelPetukhov opened a new issue #2959:
URL: https://github.com/apache/hudi/issues/2959
While working with Hudi 0.7.0 we were able to store data from Kafka topics to hdfs
We tried to migrate to 0.8.0, but we've discovered a strange behavior -
spark submit finishes with status SUCCEEDED but no data is actually stored in HDFS
Only .hoodie folder is created is in the desired location with files like .aux, .temp, deltacommit.infligh, deltacommit.requested, hoodie.properties, archived
Spark Submit looks like this (attached only Hudi related configurations, can send full request if necessary):
(
Please note that 0.7.0 with the same config worked (data is stored as expected), only
hudi-utilities-bundle_2.12:0.8.0 changed from hudi-utilities-bundle_2.11:0.7.0
spark-avro_2.12:2.4.7 changed from spark-avro_2.11:2.4.7
hoodie-utilities.jar taken hudi-0.8.0-utilities-2.12.jar instead of hudi-0.7.0-utilities-2.11
)
/usr/local/spark/bin/spark-submit --conf "spark.yarn.submit.waitAppCompletion=false" \
--packages org.apache.hudi:hudi-utilities-bundle_2.12:0.8.0,org.apache.spark:spark-avro_2.12:2.4.7 \
--master yarn \
--deploy-mode cluster \
--name xxx \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
/app/hoodie-utilities.jar \
--op BULK_INSERT \
--table-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
--source-ordering-field __null_ts_ms \
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--enable-hive-sync \
--target-base-path xxx \
--target-table xxx \
--hoodie-conf "hoodie.datasource.hive_sync.enable=true" \
--hoodie-conf "hoodie.datasource.hive_sync.table=foo" \
--hoodie-conf "hoodie.datasource.hive_sync.partition_fields=date:TIMESTAMP" \
--hoodie-conf "hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.SlashEncodedDayPartitionValueExtractor" \
--hoodie-conf "hoodie.datasource.hive_sync.jdbcurl=" \
--hoodie-conf "hoodie.upsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.insert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.delete.shuffle.parallelism=2" \
--hoodie-conf "hoodie.bulkinsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.embed.timeline.server=true" \
--hoodie-conf "hoodie.filesystem.view.type=EMBEDDED_KV_STORE" \
--hoodie-conf "hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-dd'T'HH:mm:ssZ,yyyy-MM-dd'T'HH:mm:ss.SSSZ" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.timezone=" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd" \
--hoodie-conf "hoodie.deltastreamer.schemaprovider.registry.url=xxx \
--hoodie-conf "xxx" \
--hoodie-conf "auto.offset.reset=earliest" \
--hoodie-conf "group.id=hudi_group" \
--hoodie-conf "schema.registry.url=xxx" \
--hoodie-conf "hoodie.parquet.small.file.limit=0" \
--hoodie-conf "hoodie.clustering.inline=true" \
--hoodie-conf "hoodie.clustering.inline.max.commits=4" \
--hoodie-conf "hoodie.clustering.plan.strategy.target.file.max.bytes=1073741824" \
--hoodie-conf "hoodie.clustering.plan.strategy.small.file.limit=629145600" \
--hoodie-conf "hoodie.datasource.write.recordkey.field=id" \
--hoodie-conf "hoodie.datasource.write.partitionpath.field=date:TIMESTAMP" \
--hoodie-conf "hoodie.deltastreamer.source.kafka.topic=xxx" \
* Hudi version : 0.8.0
* Spark version : 2.4.7
* Storage (HDFS/S3/GCS..) : hdfs
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848885930
@n3nash
.hoodie directory structure is the following
hdfs dfs -ls /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie
Found 7 items
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/.aux
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/.temp
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/20210526183328.deltacommit
-rw-r--r-- 3 hdfs hadoop 518 2021-05-26 18:33 /path_to_location/foo/.hoodie/20210526183328.deltacommit.inflight
-rw-r--r-- 3 hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/20210526183328.deltacommit.requested
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/archived
-rw-r--r-- 3 hdfs hadoop 391 2021-05-26 18:33 /path_to_location/foo/.hoodie/hoodie.properties
Also, I have removed everything unrelated, so the request looks like this:
/usr/local/spark/bin/spark-submit --conf "spark.yarn.submit.waitAppCompletion=false" \
--conf "spark.dynamicAllocation.minExecutors=1" \
--conf "spark.dynamicAllocation.maxExecutors=10" \
--conf "spark.dynamicAllocation.enabled=true" \
--conf "spark.dynamicAllocation.shuffleTracking.enabled=true" \
--conf "spark.shuffle.service.enabled=true" \
--conf "spark.eventLog.enabled=true" \
--conf "spark.eventLog.dir=hdfs://xxx/eventLogging" \
--conf "spark.executor.memoryOverhead=384" \
--conf "spark.driver.memoryOverhead=384" \
--conf "spark.driver.extraJavaOptions=-DsparkAappName=xxx -DlogIndex=GOLANG_JSON -DappName=data-lake-extractors-streamer -DlogFacility=stdout" \
--packages org.apache.spark:spark-avro_2.12:2.4.7 \
--master yarn \
--deploy-mode cluster \
--name xxx \
--driver-memory 2G \
--executor-memory 2G \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
hdfs://xxx/user/hudi/hudi-utilities-bundle_2.12-0.8.0.jar \
--op UPSERT \
--table-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
--source-ordering-field __null_ts_ms \
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--target-base-path /user/hdfs/raw_data/public/xxx/yyy \
--target-table xxx \
--hoodie-conf "hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.timezone=" \
--hoodie-conf "hoodie.upsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.insert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.delete.shuffle.parallelism=2" \
--hoodie-conf "hoodie.bulkinsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.embed.timeline.server=true" \
--hoodie-conf "hoodie.filesystem.view.type=EMBEDDED_KV_STORE" \
--hoodie-conf "hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions/latest" \
--hoodie-conf "bootstrap.servers=xxx" \
--hoodie-conf "auto.offset.reset=earliest" \
--hoodie-conf "group.id=hudi_group" \
--hoodie-conf "schema.registry.url=http://xxx" \
--hoodie-conf "hoodie.datasource.write.recordkey.field=id" \
--hoodie-conf "hoodie.datasource.write.partitionpath.field=date:TIMESTAMP" \
--hoodie-conf "hoodie.deltastreamer.source.kafka.topic=xxx" \
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848891756
This is our full log
[spark_log.txt](https://github.com/apache/hudi/files/6548390/spark_log.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] nsivabalan commented on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
nsivabalan commented on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-918433157
@PavelPetukhov : sorry for very late turn around. Were you able to get it resolved? Erasing of all data seems very strange. Definitely interested in looking into the issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar commented on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-926254291
I can't find any exceptions in the log. Please feel free to reopen if this is still an issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848885930
.hoodie directory structure is the following
hdfs dfs -ls /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie
Found 7 items
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/.aux
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/.temp
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/20210526183328.deltacommit
-rw-r--r-- 3 hdfs hadoop 518 2021-05-26 18:33 /path_to_location/foo/.hoodie/20210526183328.deltacommit.inflight
-rw-r--r-- 3 hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/20210526183328.deltacommit.requested
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /path_to_location/foo/.hoodie/archived
-rw-r--r-- 3 hdfs hadoop 391 2021-05-26 18:33 /path_to_location/foo/.hoodie/hoodie.properties
Also, I have removed everything unrelated, so the request looks like this:
/usr/local/spark/bin/spark-submit --conf "spark.yarn.submit.waitAppCompletion=false" \
--conf "spark.dynamicAllocation.minExecutors=1" \
--conf "spark.dynamicAllocation.maxExecutors=10" \
--conf "spark.dynamicAllocation.enabled=true" \
--conf "spark.dynamicAllocation.shuffleTracking.enabled=true" \
--conf "spark.shuffle.service.enabled=true" \
--conf "spark.eventLog.enabled=true" \
--conf "spark.eventLog.dir=hdfs://xxx/eventLogging" \
--conf "spark.executor.memoryOverhead=384" \
--conf "spark.driver.memoryOverhead=384" \
--conf "spark.driver.extraJavaOptions=-DsparkAappName=xxx -DlogIndex=GOLANG_JSON -DappName=data-lake-extractors-streamer -DlogFacility=stdout" \
--packages org.apache.spark:spark-avro_2.12:2.4.7 \
--master yarn \
--deploy-mode cluster \
--name xxx \
--driver-memory 2G \
--executor-memory 2G \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
hdfs://xxx/user/hudi/hudi-utilities-bundle_2.12-0.8.0.jar \
--op UPSERT \
--table-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
--source-ordering-field __null_ts_ms \
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--target-base-path /user/hdfs/raw_data/public/xxx/yyy \
--target-table xxx \
--hoodie-conf "hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.timezone=" \
--hoodie-conf "hoodie.upsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.insert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.delete.shuffle.parallelism=2" \
--hoodie-conf "hoodie.bulkinsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.embed.timeline.server=true" \
--hoodie-conf "hoodie.filesystem.view.type=EMBEDDED_KV_STORE" \
--hoodie-conf "hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions/latest" \
--hoodie-conf "bootstrap.servers=xxx" \
--hoodie-conf "auto.offset.reset=earliest" \
--hoodie-conf "group.id=hudi_group" \
--hoodie-conf "schema.registry.url=http://xxx" \
--hoodie-conf "hoodie.datasource.write.recordkey.field=id" \
--hoodie-conf "hoodie.datasource.write.partitionpath.field=date:TIMESTAMP" \
--hoodie-conf "hoodie.deltastreamer.source.kafka.topic=xxx" \
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848930327
Below is our full log:
Logged in as: dr.who
Application
About
Jobs
Tools
Log Type: stderr
Log Upload Time: Wed May 26 18:33:34 +0300 2021
Log Length: 104910
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for TERM
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for HUP
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for INT
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/26 18:33:18 INFO yarn.ApplicationMaster: Preparing Local resources
21/05/26 18:33:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/05/26 18:33:19 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1618828995116_0162_000001
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Waiting for spark context initialization...
21/05/26 18:33:19 WARN deltastreamer.SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
21/05/26 18:33:19 INFO spark.SparkContext: Running Spark version 2.4.7
21/05/26 18:33:19 INFO spark.SparkContext: Submitted application: xxx
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 37691.
21/05/26 18:33:20 INFO spark.SparkEnv: Registering MapOutputTracker
21/05/26 18:33:20 INFO spark.SparkEnv: Registering BlockManagerMaster
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/26 18:33:20 INFO storage.DiskBlockManager: Created local directory at /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/blockmgr-9de167db-4756-414e-9126-32cb562e91aa
21/05/26 18:33:20 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB
21/05/26 18:33:20 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/05/26 18:33:20 INFO util.log: Logging initialized @2935ms
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
21/05/26 18:33:20 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
21/05/26 18:33:20 INFO server.Server: Started @3069ms
21/05/26 18:33:20 INFO server.AbstractConnector: Started ServerConnector@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:32822}
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'SparkUI' on port 32822.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@43837fbc{/jobs,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d91ba30{/jobs/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4854d5d9{/jobs/job,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@672e7ec3{/jobs/job/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67ee182c{/stages,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@97af315{/stages/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1936a0e0{/stages/stage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@447ef19e{/stages/stage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68e36851{/stages/pool,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@352fe12b{/stages/pool/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d39f28d{/storage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e7806b5{/storage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d2a56cb{/storage/rdd,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37c6c6fc{/storage/rdd/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4599e713{/environment,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b9a0cbb{/environment/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24299f0d{/executors,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@25594c52{/executors/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f728695{/executors/threadDump,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7456a814{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1cef9064{/static,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16ba2eda{/,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@dac88e2{/api,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@145850ef{/jobs/job/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d678cf2{/stages/stage/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx:32822
21/05/26 18:33:20 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
21/05/26 18:33:20 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1618828995116_0162 and attemptId Some(appattempt_1618828995116_0162_000001)
21/05/26 18:33:20 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:20 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38417.
21/05/26 18:33:20 INFO netty.NettyBlockTransferService: Server created on xxx:38417
21/05/26 18:33:20 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:38417 with 912.3 MB RAM, BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManager: external shuffle service port = 7337
21/05/26 18:33:20 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c78ce{/metrics/json,null,AVAILABLE,@Spark}
21/05/26 18:33:21 INFO scheduler.EventLoggingListener: Logging events to hdfs://xxx:8020/eventLogging/application_1618828995116_0162_1
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
21/05/26 18:33:21 INFO client.RMProxy: Connecting to ResourceManager at xxx/10.246.4.117:8030
21/05/26 18:33:21 INFO yarn.YarnRMClient: Registering the ApplicationMaster
21/05/26 18:33:21 INFO yarn.ApplicationMaster:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/conf<CPS>/usr/hdp/2.6.0.3-8/hadoop/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
SPARK_USER -> hdfs
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx2048m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.driver.port=37691' \
'-Dspark.ui.port=0' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@xxx:37691 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1618828995116_0162 \
--user-class-path \
file:$PWD/__app__.jar \
--user-class-path \
file:$PWD/org.apache.spark_spark-avro_2.12-2.4.7.jar \
--user-class-path \
file:$PWD/org.spark-project.spark_unused-1.0.0.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
org.apache.spark_spark-avro_2.12-2.4.7.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.apache.spark_spark-avro_2.12-2.4.7.jar" } size: 107269 timestamp: 1622043191967 type: FILE visibility: PRIVATE
__app__.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/jars/hudi/hudi-utilities-bundle_2.12-0.8.0.jar" } size: 40399204 timestamp: 1622022896130 type: FILE visibility: PUBLIC
__spark_conf__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_conf__.zip" } size: 205423 timestamp: 1622043193955 type: ARCHIVE visibility: PRIVATE
org.spark-project.spark_unused-1.0.0.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.spark-project.spark_unused-1.0.0.jar" } size: 2777 timestamp: 1622043192905 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_libs__2858796966972713370.zip" } size: 242613518 timestamp: 1622043190403 type: ARCHIVE visibility: PRIVATE
===============================================================================
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@xxx:37691)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:21 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
21/05/26 18:33:22 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:22 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000002 on host xxx for executor with ID 1
21/05/26 18:33:22 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:25 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.3.9:49980) with ID 1
21/05/26 18:33:25 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
21/05/26 18:33:25 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
21/05/26 18:33:25 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO utilities.UtilHelpers: Adding overridden properties to file properties.
21/05/26 18:33:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
21/05/26 18:33:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:35696 with 912.3 MB RAM, BlockManagerId(1, xxx, 35696, None)
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Creating delta streamer with configs : {hoodie.deltastreamer.keygen.timebased.input.timezone=, hoodie.embed.timeline.server=true, schema.registry.url=http://xxx, hoodie.filesystem.view.type=EMBEDDED_KV_STORE, hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ, hoodie.delete.shuffle.parallelism=2, hoodie.bulkinsert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd, group.id=hudi_group_080, auto.offset.reset=earliest, hoodie.insert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator, hoodie.deltastreamer.source.kafka.topic=xxx, bootstrap.servers=xxx:9092, hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=, hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions
/latest, hoodie.datasource.write.recordkey.field=id, hoodie.upsert.shuffle.parallelism=2, hoodie.datasource.write.partitionpath.field=date:TIMESTAMP}
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Initializing /user/hd_xyz/yyy/ml_xxx/foo as hoodie table /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished initializing Table of type MERGE_ON_READ from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Delta Streamer running only single round
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:26 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:26 INFO deltastreamer.DeltaSync: Checkpoint to resume from : Optional.empty
21/05/26 18:33:26 INFO consumer.ConsumerConfig: ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [xxx]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = hudi_group_080
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer
21/05/26 18:33:26 INFO serializers.KafkaAvroDeserializerConfig: KafkaAvroDeserializerConfig values:
schema.registry.url = [xxx]
max.schemas.per.subject = 1000
specific.avro.reader = false
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.timestamp.type' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.output.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.delete.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.upsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.keygenerator.class' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.schemaprovider.registry.url' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.insert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.embed.timeline.server' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.bulkinsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.timezone' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.filesystem.view.type' was supplied but isn't a known config.
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka version: 2.4.1
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka commitId: c57222ae8cd7866b
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka startTimeMs: 1622043206225
21/05/26 18:33:26 INFO clients.Metadata: [Consumer clientId=consumer-hudi_group_080-1, groupId=hudi_group_080] Cluster ID: 5XoPi9AYT0mbHVQEj6VEaw
21/05/26 18:33:27 INFO helpers.KafkaOffsetGen: SourceLimit not configured, set numEvents to default value : 5000000
21/05/26 18:33:27 INFO sources.AvroKafkaSource: About to read 0 from Kafka for topic :xxx
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: No new data, perform empty commit.
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Setting up new Hoodie Write Client
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Starting Timeline service !!
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Overriding hostIp to (xxx) found in spark-conf. It was null
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :EMBEDDED_KV_STORE
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating embedded rocks-db based Table View
21/05/26 18:33:27 INFO util.log: Logging initialized @9978ms to org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
21/05/26 18:33:27 INFO javalin.Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
21/05/26 18:33:27 INFO javalin.Javalin: Starting Javalin ...
21/05/26 18:33:27 INFO javalin.Javalin: Listening on http://localhost:37089/
21/05/26 18:33:27 INFO javalin.Javalin: Javalin started in 179ms \o/
21/05/26 18:33:27 INFO service.TimelineService: Starting Timeline server on port :37089
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Started embedded timeline server at xxx:37089
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO client.AbstractHoodieClient: Timeline Server already running. Not restarting the service
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210526183328 action: deltacommit
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210526183328__deltacommit__REQUESTED]
21/05/26 18:33:28 INFO deltastreamer.DeltaSync: Starting commit : 20210526183328
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED]]
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:28 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:28 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now
21/05/26 18:33:28 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:114
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 1 (mapToPair at SparkWriteHelper.java:54) as input to shuffle 1
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 5 (countByKey at SparkHoodieBloomIndex.java:114) as input to shuffle 0
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Got job 0 (countByKey at SparkHoodieBloomIndex.java:114) with 2 output partitions
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.2 KB, free 912.3 MB)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
21/05/26 18:33:28 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:28 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.3 KB, free 912.3 MB)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:28 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:28 INFO cluster.YarnClusterScheduler: Adding task set 1.0 with 2 tasks
21/05/26 18:33:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:29 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000004 on host xxx for executor with ID 2
21/05/26 18:33:29 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1023 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 70 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (countByKey at SparkHoodieBloomIndex.java:114) finished in 1.177 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.8 KB, free 912.3 MB)
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 912.3 MB)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Adding task set 2.0 with 2 tasks
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.246.3.9:49980
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 3, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 85 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 3) in 32 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114) finished in 0.126 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Job 0 finished: countByKey at SparkHoodieBloomIndex.java:114, took 1.627903 s
21/05/26 18:33:29 INFO yarn.YarnAllocator: Driver requested a total number of 1 executor(s).
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 1 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (collect at HoodieSparkEngineContext.java:78)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 368.5 KB, free 911.9 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 101.0 KB, free 911.8 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:38417 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 3.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:35696 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 178 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 0.233 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 1 finished: collect at HoodieSparkEngineContext.java:78, took 0.236923 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:73
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 2 (collect at HoodieSparkEngineContext.java:73) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (collect at HoodieSparkEngineContext.java:73)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 368.3 KB, free 911.5 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 100.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:38417 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 4.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 5, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:35696 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 5) in 94 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 4 (collect at HoodieSparkEngineContext.java:73) finished in 0.167 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 2 finished: collect at HoodieSparkEngineContext.java:73, took 0.174163 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:149
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Registering RDD 14 (countByKey at SparkHoodieBloomIndex.java:149) as input to shuffle 2
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 3 (countByKey at SparkHoodieBloomIndex.java:149) with 2 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 7.5 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:38417 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 6.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:35696 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 7, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 6) in 60 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 7) in 36 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.121 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:30 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.8 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 7.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 8, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.246.3.9:49980
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 9, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 8) in 47 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 9) in 20 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 7.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.081 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at SparkHoodieBloomIndex.java:149, took 0.219895 s
21/05/26 18:33:30 INFO bloom.SparkHoodieBloomIndex: InputParallelism: ${2}, IndexParallelism: ${0}
21/05/26 18:33:30 INFO bloom.BucketizedBloomCheckPartitioner: TotalBuckets 0, min_buckets/partition 1
21/05/26 18:33:30 INFO rdd.MapPartitionsRDD: Removing RDD 3 from persistence list
21/05/26 18:33:30 INFO storage.BlockManager: Removing RDD 3
21/05/26 18:33:31 INFO rdd.MapPartitionsRDD: Removing RDD 22 from persistence list
21/05/26 18:33:31 INFO storage.BlockManager: Removing RDD 22
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 16 (mapToPair at SparkHoodieBloomIndex.java:266) as input to shuffle 6
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 23 (mapToPair at SparkHoodieBloomIndex.java:287) as input to shuffle 3
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 22 (flatMapToPair at SparkHoodieBloomIndex.java:274) as input to shuffle 4
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 31 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 5
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Got job 4 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 5.9 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.3 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 10.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 10, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 10.0 (TID 11, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 10) in 50 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 10.0 (TID 11) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 10.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 10 (mapToPair at SparkHoodieBloomIndex.java:287) finished in 0.092 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 12, ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 7.1 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:38417 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 12.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 12, xxx, executor 1, partition 0, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:35696 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 10.246.3.9:49980
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 4 to 10.246.3.9:49980
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_0 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 12.0 (TID 13, xxx, executor 1, partition 1, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 12) in 105 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_1 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 12.0 (TID 13) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 12.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 12 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.146 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 13.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 14, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 5 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 13.0 (TID 15, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 14) in 31 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 13.0 (TID 15) in 12 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 13.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.064 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 4 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.320123 s
21/05/26 18:33:31 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=0, numUpdates=0}, partitionStat={}, operationType=UPSERT}
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.requested
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:31 INFO commit.UpsertPartitioner: AvgRecordSize => 1024
21/05/26 18:33:31 INFO view.AbstractTableFileSystemView: Took 3 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:31 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:31 INFO commit.UpsertPartitioner: Total Buckets :0, buckets info => {},
Partition to insert buckets => {},
UpdateLocations mapped to buckets =>{}
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 175
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 62
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 9
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 148
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 105
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 143
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 55
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 209
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 154
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 147
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 163
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 69
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 34
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 100
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 1
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 193
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 169
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 27
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 16
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 115
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 120
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 106
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 174
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 210
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 96
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 6
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 57
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 133
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 11
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 74
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 107
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 164
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 172
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 176
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 194
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 109
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 37
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 177
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 128
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 182
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 205
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 30
21/05/26 18:33:31 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210526183328
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 102
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 180
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 150
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 186
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 89
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 223
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 47
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 158
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 162
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 88
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 39
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 8
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 29
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 124
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 75
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 165
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 217
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 134
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 35
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 216
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 22
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 114
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 152
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 42
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 94
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 145
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 126
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 144
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 168
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:38417 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:35696 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 149
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 38
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 70
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 15
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 118
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 166
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 207
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 170
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 171
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 65
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 97
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 110
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 222
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 87
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 192
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 201
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 117
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 123
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 12
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 60
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 84
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 127
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 91
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 136
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 45
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 200
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 64
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:38417 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:35696 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 92
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 0
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 81
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 185
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 214
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 21
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 31
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 67
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 112
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 178
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 208
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 78
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 73
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 131
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 61
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 3
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:38417 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:35696 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:448
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 5 finished: sum at DeltaSync.java:448, took 0.000044 s
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 36
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 80
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 103
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 108
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 183
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 72
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 54
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 132
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 99
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 19
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 93
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 179
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 215
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 66
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 77
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 151
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 116
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 191
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 17
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 14
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 18
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 125
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 204
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 146
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 50
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 56
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 52
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 101
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 221
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 213
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 181
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 190
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 85
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 156
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 161
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 53
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 197
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 20
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 41
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 44
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 140
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 218
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 188
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 122
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 195
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 167
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 220
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 43
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 199
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 155
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 24
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 219
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 71
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 198
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 23
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 135
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 26
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 141
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 121
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 157
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 13
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 130
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned shuffle 0
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 7
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 138
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 63
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 187
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 32
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 196
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 48
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 206
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 119
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 160
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 90
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 40
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 113
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 68
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 224
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 28
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 202
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 10
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 139
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 76
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 49
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 137
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 58
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:38417 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:35696 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 4
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 211
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 212
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 83
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 203
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 33
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 86
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 82
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 95
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 142
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 111
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 98
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 184
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 46
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 129
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 104
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 159
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 59
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 25
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 173
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 79
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 153
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 189
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 51
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:449
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 6 finished: sum at DeltaSync.java:449, took 0.000035 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 7 finished: collect at SparkRDDWriteClient.java:120, took 0.000039 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO util.CommitUtils: Creating metadata for UPSERT numWriteStats:0numReplaceFileIds:0
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Committing 20210526183328 action deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Completed [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED], [==>20210526183328__deltacommit__INFLIGHT], [20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO table.HoodieTimelineArchiveLog: No Instants to archive
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210526183332
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hd_xyz/yyy/ml_xxx/foo. Server=xxx:37089, Timeout=300
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:32 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:32 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://xxx:37089/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1)
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO collection.RocksDBDAO: DELETING RocksDB persisted at /tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11
21/05/26 18:33:33 INFO collection.RocksDBDAO: No column family found. Loading default
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:230] Creating manifest 1
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3406] Recovering from manifest file: MANIFEST-000001
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [default]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3610] Recovered from manifest file:/tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3618] Column family [default] (ID 0), log number is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:1287] DB pointer 0x7f3aaccf1f20
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:2936] Creating manifest 6
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo] (ID 1)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo] (ID 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo] (ID 3)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo] (ID 4)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 5)
21/05/26 18:33:33 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.4.117:53684) with ID 2
21/05/26 18:33:33 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 6)
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.AbstractTableFileSystemView: Took 9 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing pending compaction operations. Count=0
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing external data file mapping. Count=0
21/05/26 18:33:33 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting file groups in pending clustering to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Created ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix Search for (query=) on hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo. Total Time Taken (msec)=1. Serialization Time taken(micro)=0, num entries=0
21/05/26 18:33:33 INFO service.RequestHandler: TimeTakenMillis[Total=791, Refresh=779, handle=11, Check=1], Success=true, Query=basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1, Host=xxx:37089, synced=false
21/05/26 18:33:33 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:36920 with 912.3 MB RAM, BlockManagerId(2, xxx, 36920, None)
21/05/26 18:33:33 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !!
21/05/26 18:33:33 INFO clean.CleanPlanner: Nothing to clean here. It is already clean
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaner started
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Committed 20210526183328
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling table service COMPACT
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling compaction at instant time :20210526183333
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO compact.SparkScheduleCompactionActionExecutor: Checking if compaction needs to be run on /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO deltastreamer.DeltaSync: Commit 20210526183328 successful!
21/05/26 18:33:33 INFO rdd.MapPartitionsRDD: Removing RDD 29 from persistence list
21/05/26 18:33:33 INFO storage.BlockManager: Removing RDD 29
21/05/26 18:33:34 INFO rdd.MapPartitionsRDD: Removing RDD 37 from persistence list
21/05/26 18:33:34 INFO storage.BlockManager: Removing RDD 37
21/05/26 18:33:34 INFO deltastreamer.DeltaSync: Shutting down embedded timeline server
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closing Timeline server
21/05/26 18:33:34 INFO service.TimelineService: Closing Timeline Service
21/05/26 18:33:34 INFO javalin.Javalin: Stopping Javalin ...
21/05/26 18:33:34 INFO javalin.Javalin: Javalin has stopped
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closing Rocksdb !!
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:365] Shutdown: canceling all background work
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:521] Shutdown complete
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closed Rocksdb !!
21/05/26 18:33:34 INFO service.TimelineService: Closed Timeline Service
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closed Timeline server
21/05/26 18:33:34 INFO deltastreamer.HoodieDeltaStreamer: Shut down delta streamer
21/05/26 18:33:34 INFO server.AbstractConnector: Stopped Spark@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
21/05/26 18:33:34 INFO ui.SparkUI: Stopped Spark web UI at http://xxx:32822
21/05/26 18:33:34 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
21/05/26 18:33:34 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
21/05/26 18:33:34 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
21/05/26 18:33:34 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
21/05/26 18:33:34 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/05/26 18:33:34 INFO memory.MemoryStore: MemoryStore cleared
21/05/26 18:33:34 INFO storage.BlockManager: BlockManager stopped
21/05/26 18:33:34 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/05/26 18:33:34 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/05/26 18:33:34 INFO spark.SparkContext: Successfully stopped SparkContext
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
21/05/26 18:33:34 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
21/05/26 18:33:34 INFO util.ShutdownHookManager: Shutdown hook called
21/05/26 18:33:34 INFO util.ShutdownHookManager: Deleting directory /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/spark-4c7e81b9-e526-4325-abf0-d163828b92b5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov commented on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov commented on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848930327
Below is our full log:
Logged in as: dr.who
Application
About
Jobs
Tools
Log Type: stderr
Log Upload Time: Wed May 26 18:33:34 +0300 2021
Log Length: 104910
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for TERM
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for HUP
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for INT
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/26 18:33:18 INFO yarn.ApplicationMaster: Preparing Local resources
21/05/26 18:33:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/05/26 18:33:19 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1618828995116_0162_000001
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Waiting for spark context initialization...
21/05/26 18:33:19 WARN deltastreamer.SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
21/05/26 18:33:19 INFO spark.SparkContext: Running Spark version 2.4.7
21/05/26 18:33:19 INFO spark.SparkContext: Submitted application: xxx
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 37691.
21/05/26 18:33:20 INFO spark.SparkEnv: Registering MapOutputTracker
21/05/26 18:33:20 INFO spark.SparkEnv: Registering BlockManagerMaster
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/26 18:33:20 INFO storage.DiskBlockManager: Created local directory at /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/blockmgr-9de167db-4756-414e-9126-32cb562e91aa
21/05/26 18:33:20 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB
21/05/26 18:33:20 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/05/26 18:33:20 INFO util.log: Logging initialized @2935ms
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
21/05/26 18:33:20 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
21/05/26 18:33:20 INFO server.Server: Started @3069ms
21/05/26 18:33:20 INFO server.AbstractConnector: Started ServerConnector@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:32822}
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'SparkUI' on port 32822.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@43837fbc{/jobs,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d91ba30{/jobs/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4854d5d9{/jobs/job,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@672e7ec3{/jobs/job/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67ee182c{/stages,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@97af315{/stages/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1936a0e0{/stages/stage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@447ef19e{/stages/stage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68e36851{/stages/pool,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@352fe12b{/stages/pool/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d39f28d{/storage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e7806b5{/storage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d2a56cb{/storage/rdd,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37c6c6fc{/storage/rdd/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4599e713{/environment,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b9a0cbb{/environment/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24299f0d{/executors,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@25594c52{/executors/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f728695{/executors/threadDump,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7456a814{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1cef9064{/static,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16ba2eda{/,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@dac88e2{/api,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@145850ef{/jobs/job/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d678cf2{/stages/stage/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx:32822
21/05/26 18:33:20 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
21/05/26 18:33:20 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1618828995116_0162 and attemptId Some(appattempt_1618828995116_0162_000001)
21/05/26 18:33:20 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:20 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38417.
21/05/26 18:33:20 INFO netty.NettyBlockTransferService: Server created on xxx:38417
21/05/26 18:33:20 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:38417 with 912.3 MB RAM, BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManager: external shuffle service port = 7337
21/05/26 18:33:20 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c78ce{/metrics/json,null,AVAILABLE,@Spark}
21/05/26 18:33:21 INFO scheduler.EventLoggingListener: Logging events to hdfs://xxx:8020/eventLogging/application_1618828995116_0162_1
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
21/05/26 18:33:21 INFO client.RMProxy: Connecting to ResourceManager at xxx/10.246.4.117:8030
21/05/26 18:33:21 INFO yarn.YarnRMClient: Registering the ApplicationMaster
21/05/26 18:33:21 INFO yarn.ApplicationMaster:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/conf<CPS>/usr/hdp/2.6.0.3-8/hadoop/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
SPARK_USER -> hdfs
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx2048m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.driver.port=37691' \
'-Dspark.ui.port=0' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@xxx:37691 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1618828995116_0162 \
--user-class-path \
file:$PWD/__app__.jar \
--user-class-path \
file:$PWD/org.apache.spark_spark-avro_2.12-2.4.7.jar \
--user-class-path \
file:$PWD/org.spark-project.spark_unused-1.0.0.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
org.apache.spark_spark-avro_2.12-2.4.7.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.apache.spark_spark-avro_2.12-2.4.7.jar" } size: 107269 timestamp: 1622043191967 type: FILE visibility: PRIVATE
__app__.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/jars/hudi/hudi-utilities-bundle_2.12-0.8.0.jar" } size: 40399204 timestamp: 1622022896130 type: FILE visibility: PUBLIC
__spark_conf__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_conf__.zip" } size: 205423 timestamp: 1622043193955 type: ARCHIVE visibility: PRIVATE
org.spark-project.spark_unused-1.0.0.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.spark-project.spark_unused-1.0.0.jar" } size: 2777 timestamp: 1622043192905 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_libs__2858796966972713370.zip" } size: 242613518 timestamp: 1622043190403 type: ARCHIVE visibility: PRIVATE
===============================================================================
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@xxx:37691)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:21 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
21/05/26 18:33:22 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:22 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000002 on host xxx for executor with ID 1
21/05/26 18:33:22 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:25 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.3.9:49980) with ID 1
21/05/26 18:33:25 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
21/05/26 18:33:25 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
21/05/26 18:33:25 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO utilities.UtilHelpers: Adding overridden properties to file properties.
21/05/26 18:33:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
21/05/26 18:33:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:35696 with 912.3 MB RAM, BlockManagerId(1, xxx, 35696, None)
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Creating delta streamer with configs : {hoodie.deltastreamer.keygen.timebased.input.timezone=, hoodie.embed.timeline.server=true, schema.registry.url=http://xxx, hoodie.filesystem.view.type=EMBEDDED_KV_STORE, hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ, hoodie.delete.shuffle.parallelism=2, hoodie.bulkinsert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd, group.id=hudi_group_080, auto.offset.reset=earliest, hoodie.insert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator, hoodie.deltastreamer.source.kafka.topic=xxx, bootstrap.servers=xxx:9092, hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=, hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions
/latest, hoodie.datasource.write.recordkey.field=id, hoodie.upsert.shuffle.parallelism=2, hoodie.datasource.write.partitionpath.field=date:TIMESTAMP}
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Initializing /user/hd_xyz/yyy/ml_xxx/foo as hoodie table /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished initializing Table of type MERGE_ON_READ from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Delta Streamer running only single round
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:26 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:26 INFO deltastreamer.DeltaSync: Checkpoint to resume from : Optional.empty
21/05/26 18:33:26 INFO consumer.ConsumerConfig: ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [xxx]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = hudi_group_080
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer
21/05/26 18:33:26 INFO serializers.KafkaAvroDeserializerConfig: KafkaAvroDeserializerConfig values:
schema.registry.url = [xxx]
max.schemas.per.subject = 1000
specific.avro.reader = false
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.timestamp.type' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.output.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.delete.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.upsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.keygenerator.class' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.schemaprovider.registry.url' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.insert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.embed.timeline.server' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.bulkinsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.timezone' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.filesystem.view.type' was supplied but isn't a known config.
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka version: 2.4.1
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka commitId: c57222ae8cd7866b
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka startTimeMs: 1622043206225
21/05/26 18:33:26 INFO clients.Metadata: [Consumer clientId=consumer-hudi_group_080-1, groupId=hudi_group_080] Cluster ID: 5XoPi9AYT0mbHVQEj6VEaw
21/05/26 18:33:27 INFO helpers.KafkaOffsetGen: SourceLimit not configured, set numEvents to default value : 5000000
21/05/26 18:33:27 INFO sources.AvroKafkaSource: About to read 0 from Kafka for topic :xxx
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: No new data, perform empty commit.
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Setting up new Hoodie Write Client
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Starting Timeline service !!
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Overriding hostIp to (xxx) found in spark-conf. It was null
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :EMBEDDED_KV_STORE
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating embedded rocks-db based Table View
21/05/26 18:33:27 INFO util.log: Logging initialized @9978ms to org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
21/05/26 18:33:27 INFO javalin.Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
21/05/26 18:33:27 INFO javalin.Javalin: Starting Javalin ...
21/05/26 18:33:27 INFO javalin.Javalin: Listening on http://localhost:37089/
21/05/26 18:33:27 INFO javalin.Javalin: Javalin started in 179ms \o/
21/05/26 18:33:27 INFO service.TimelineService: Starting Timeline server on port :37089
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Started embedded timeline server at xxx:37089
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO client.AbstractHoodieClient: Timeline Server already running. Not restarting the service
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210526183328 action: deltacommit
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210526183328__deltacommit__REQUESTED]
21/05/26 18:33:28 INFO deltastreamer.DeltaSync: Starting commit : 20210526183328
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED]]
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:28 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:28 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now
21/05/26 18:33:28 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:114
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 1 (mapToPair at SparkWriteHelper.java:54) as input to shuffle 1
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 5 (countByKey at SparkHoodieBloomIndex.java:114) as input to shuffle 0
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Got job 0 (countByKey at SparkHoodieBloomIndex.java:114) with 2 output partitions
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.2 KB, free 912.3 MB)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
21/05/26 18:33:28 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:28 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.3 KB, free 912.3 MB)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:28 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:28 INFO cluster.YarnClusterScheduler: Adding task set 1.0 with 2 tasks
21/05/26 18:33:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:29 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000004 on host xxx for executor with ID 2
21/05/26 18:33:29 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1023 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 70 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (countByKey at SparkHoodieBloomIndex.java:114) finished in 1.177 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.8 KB, free 912.3 MB)
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 912.3 MB)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Adding task set 2.0 with 2 tasks
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.246.3.9:49980
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 3, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 85 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 3) in 32 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114) finished in 0.126 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Job 0 finished: countByKey at SparkHoodieBloomIndex.java:114, took 1.627903 s
21/05/26 18:33:29 INFO yarn.YarnAllocator: Driver requested a total number of 1 executor(s).
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 1 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (collect at HoodieSparkEngineContext.java:78)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 368.5 KB, free 911.9 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 101.0 KB, free 911.8 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:38417 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 3.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:35696 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 178 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 0.233 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 1 finished: collect at HoodieSparkEngineContext.java:78, took 0.236923 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:73
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 2 (collect at HoodieSparkEngineContext.java:73) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (collect at HoodieSparkEngineContext.java:73)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 368.3 KB, free 911.5 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 100.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:38417 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 4.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 5, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:35696 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 5) in 94 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 4 (collect at HoodieSparkEngineContext.java:73) finished in 0.167 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 2 finished: collect at HoodieSparkEngineContext.java:73, took 0.174163 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:149
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Registering RDD 14 (countByKey at SparkHoodieBloomIndex.java:149) as input to shuffle 2
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 3 (countByKey at SparkHoodieBloomIndex.java:149) with 2 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 7.5 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:38417 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 6.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:35696 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 7, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 6) in 60 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 7) in 36 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.121 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:30 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.8 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 7.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 8, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.246.3.9:49980
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 9, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 8) in 47 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 9) in 20 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 7.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.081 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at SparkHoodieBloomIndex.java:149, took 0.219895 s
21/05/26 18:33:30 INFO bloom.SparkHoodieBloomIndex: InputParallelism: ${2}, IndexParallelism: ${0}
21/05/26 18:33:30 INFO bloom.BucketizedBloomCheckPartitioner: TotalBuckets 0, min_buckets/partition 1
21/05/26 18:33:30 INFO rdd.MapPartitionsRDD: Removing RDD 3 from persistence list
21/05/26 18:33:30 INFO storage.BlockManager: Removing RDD 3
21/05/26 18:33:31 INFO rdd.MapPartitionsRDD: Removing RDD 22 from persistence list
21/05/26 18:33:31 INFO storage.BlockManager: Removing RDD 22
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 16 (mapToPair at SparkHoodieBloomIndex.java:266) as input to shuffle 6
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 23 (mapToPair at SparkHoodieBloomIndex.java:287) as input to shuffle 3
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 22 (flatMapToPair at SparkHoodieBloomIndex.java:274) as input to shuffle 4
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 31 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 5
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Got job 4 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 5.9 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.3 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 10.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 10, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 10.0 (TID 11, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 10) in 50 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 10.0 (TID 11) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 10.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 10 (mapToPair at SparkHoodieBloomIndex.java:287) finished in 0.092 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 12, ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 7.1 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:38417 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 12.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 12, xxx, executor 1, partition 0, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:35696 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 10.246.3.9:49980
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 4 to 10.246.3.9:49980
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_0 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 12.0 (TID 13, xxx, executor 1, partition 1, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 12) in 105 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_1 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 12.0 (TID 13) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 12.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 12 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.146 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 13.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 14, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 5 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 13.0 (TID 15, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 14) in 31 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 13.0 (TID 15) in 12 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 13.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.064 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 4 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.320123 s
21/05/26 18:33:31 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=0, numUpdates=0}, partitionStat={}, operationType=UPSERT}
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.requested
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:31 INFO commit.UpsertPartitioner: AvgRecordSize => 1024
21/05/26 18:33:31 INFO view.AbstractTableFileSystemView: Took 3 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:31 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:31 INFO commit.UpsertPartitioner: Total Buckets :0, buckets info => {},
Partition to insert buckets => {},
UpdateLocations mapped to buckets =>{}
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 175
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 62
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 9
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 148
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 105
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 143
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 55
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 209
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 154
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 147
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 163
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 69
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 34
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 100
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 1
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 193
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 169
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 27
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 16
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 115
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 120
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 106
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 174
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 210
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 96
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 6
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 57
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 133
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 11
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 74
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 107
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 164
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 172
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 176
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 194
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 109
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 37
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 177
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 128
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 182
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 205
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 30
21/05/26 18:33:31 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210526183328
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 102
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 180
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 150
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 186
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 89
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 223
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 47
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 158
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 162
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 88
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 39
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 8
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 29
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 124
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 75
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 165
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 217
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 134
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 35
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 216
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 22
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 114
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 152
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 42
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 94
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 145
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 126
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 144
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 168
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:38417 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:35696 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 149
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 38
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 70
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 15
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 118
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 166
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 207
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 170
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 171
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 65
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 97
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 110
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 222
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 87
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 192
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 201
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 117
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 123
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 12
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 60
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 84
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 127
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 91
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 136
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 45
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 200
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 64
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:38417 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:35696 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 92
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 0
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 81
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 185
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 214
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 21
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 31
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 67
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 112
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 178
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 208
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 78
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 73
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 131
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 61
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 3
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:38417 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:35696 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:448
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 5 finished: sum at DeltaSync.java:448, took 0.000044 s
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 36
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 80
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 103
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 108
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 183
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 72
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 54
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 132
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 99
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 19
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 93
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 179
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 215
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 66
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 77
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 151
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 116
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 191
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 17
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 14
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 18
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 125
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 204
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 146
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 50
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 56
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 52
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 101
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 221
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 213
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 181
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 190
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 85
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 156
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 161
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 53
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 197
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 20
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 41
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 44
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 140
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 218
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 188
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 122
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 195
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 167
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 220
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 43
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 199
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 155
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 24
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 219
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 71
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 198
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 23
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 135
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 26
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 141
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 121
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 157
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 13
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 130
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned shuffle 0
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 7
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 138
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 63
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 187
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 32
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 196
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 48
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 206
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 119
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 160
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 90
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 40
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 113
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 68
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 224
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 28
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 202
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 10
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 139
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 76
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 49
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 137
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 58
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:38417 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:35696 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 4
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 211
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 212
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 83
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 203
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 33
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 86
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 82
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 95
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 142
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 111
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 98
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 184
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 46
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 129
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 104
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 159
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 59
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 25
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 173
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 79
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 153
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 189
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 51
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:449
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 6 finished: sum at DeltaSync.java:449, took 0.000035 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 7 finished: collect at SparkRDDWriteClient.java:120, took 0.000039 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO util.CommitUtils: Creating metadata for UPSERT numWriteStats:0numReplaceFileIds:0
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Committing 20210526183328 action deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Completed [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED], [==>20210526183328__deltacommit__INFLIGHT], [20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO table.HoodieTimelineArchiveLog: No Instants to archive
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210526183332
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hd_xyz/yyy/ml_xxx/foo. Server=xxx:37089, Timeout=300
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:32 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:32 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://xxx:37089/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1)
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO collection.RocksDBDAO: DELETING RocksDB persisted at /tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11
21/05/26 18:33:33 INFO collection.RocksDBDAO: No column family found. Loading default
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:230] Creating manifest 1
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3406] Recovering from manifest file: MANIFEST-000001
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [default]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3610] Recovered from manifest file:/tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3618] Column family [default] (ID 0), log number is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:1287] DB pointer 0x7f3aaccf1f20
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:2936] Creating manifest 6
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo] (ID 1)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo] (ID 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo] (ID 3)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo] (ID 4)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 5)
21/05/26 18:33:33 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.4.117:53684) with ID 2
21/05/26 18:33:33 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 6)
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.AbstractTableFileSystemView: Took 9 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing pending compaction operations. Count=0
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing external data file mapping. Count=0
21/05/26 18:33:33 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting file groups in pending clustering to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Created ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix Search for (query=) on hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo. Total Time Taken (msec)=1. Serialization Time taken(micro)=0, num entries=0
21/05/26 18:33:33 INFO service.RequestHandler: TimeTakenMillis[Total=791, Refresh=779, handle=11, Check=1], Success=true, Query=basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1, Host=xxx:37089, synced=false
21/05/26 18:33:33 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:36920 with 912.3 MB RAM, BlockManagerId(2, xxx, 36920, None)
21/05/26 18:33:33 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !!
21/05/26 18:33:33 INFO clean.CleanPlanner: Nothing to clean here. It is already clean
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaner started
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Committed 20210526183328
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling table service COMPACT
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling compaction at instant time :20210526183333
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO compact.SparkScheduleCompactionActionExecutor: Checking if compaction needs to be run on /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO deltastreamer.DeltaSync: Commit 20210526183328 successful!
21/05/26 18:33:33 INFO rdd.MapPartitionsRDD: Removing RDD 29 from persistence list
21/05/26 18:33:33 INFO storage.BlockManager: Removing RDD 29
21/05/26 18:33:34 INFO rdd.MapPartitionsRDD: Removing RDD 37 from persistence list
21/05/26 18:33:34 INFO storage.BlockManager: Removing RDD 37
21/05/26 18:33:34 INFO deltastreamer.DeltaSync: Shutting down embedded timeline server
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closing Timeline server
21/05/26 18:33:34 INFO service.TimelineService: Closing Timeline Service
21/05/26 18:33:34 INFO javalin.Javalin: Stopping Javalin ...
21/05/26 18:33:34 INFO javalin.Javalin: Javalin has stopped
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closing Rocksdb !!
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:365] Shutdown: canceling all background work
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:521] Shutdown complete
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closed Rocksdb !!
21/05/26 18:33:34 INFO service.TimelineService: Closed Timeline Service
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closed Timeline server
21/05/26 18:33:34 INFO deltastreamer.HoodieDeltaStreamer: Shut down delta streamer
21/05/26 18:33:34 INFO server.AbstractConnector: Stopped Spark@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
21/05/26 18:33:34 INFO ui.SparkUI: Stopped Spark web UI at http://xxx:32822
21/05/26 18:33:34 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
21/05/26 18:33:34 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
21/05/26 18:33:34 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
21/05/26 18:33:34 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
21/05/26 18:33:34 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/05/26 18:33:34 INFO memory.MemoryStore: MemoryStore cleared
21/05/26 18:33:34 INFO storage.BlockManager: BlockManager stopped
21/05/26 18:33:34 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/05/26 18:33:34 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/05/26 18:33:34 INFO spark.SparkContext: Successfully stopped SparkContext
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
21/05/26 18:33:34 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
21/05/26 18:33:34 INFO util.ShutdownHookManager: Shutdown hook called
21/05/26 18:33:34 INFO util.ShutdownHookManager: Deleting directory /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/spark-4c7e81b9-e526-4325-abf0-d163828b92b5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] n3nash commented on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
n3nash commented on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-847396093
@PavelPetukhov
I'm guessing there's more in the logs to tell us what is happening even though the status is `SUCCEEDED`.
1. Can you paste the contents of the `.hoodie` directory please ?
2. Can you look for any ERRORs/Exceptions in logs ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848885930
.hoodie directory structure is the following
hdfs dfs -ls /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie
Found 7 items
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/.aux
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/.temp
-rw-r--r-- 3 hdfs hadoop 1201 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/20210526183328.deltacommit
-rw-r--r-- 3 hdfs hadoop 518 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/20210526183328.deltacommit.inflight
-rw-r--r-- 3 hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/20210526183328.deltacommit.requested
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/archived
-rw-r--r-- 3 hdfs hadoop 391 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/hoodie.properties
Also, I have removed everything unrelated, so the request looks like this:
/usr/local/spark/bin/spark-submit --conf "spark.yarn.submit.waitAppCompletion=false" \
--conf "spark.dynamicAllocation.minExecutors=1" \
--conf "spark.dynamicAllocation.maxExecutors=10" \
--conf "spark.dynamicAllocation.enabled=true" \
--conf "spark.dynamicAllocation.shuffleTracking.enabled=true" \
--conf "spark.shuffle.service.enabled=true" \
--conf "spark.eventLog.enabled=true" \
--conf "spark.eventLog.dir=hdfs://xxx/eventLogging" \
--conf "spark.executor.memoryOverhead=384" \
--conf "spark.driver.memoryOverhead=384" \
--conf "spark.driver.extraJavaOptions=-DsparkAappName=xxx -DlogIndex=GOLANG_JSON -DappName=data-lake-extractors-streamer -DlogFacility=stdout" \
--packages org.apache.spark:spark-avro_2.12:2.4.7 \
--master yarn \
--deploy-mode cluster \
--name xxx \
--driver-memory 2G \
--executor-memory 2G \
--class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer \
hdfs://xxx/user/hudi/hudi-utilities-bundle_2.12-0.8.0.jar \
--op UPSERT \
--table-type MERGE_ON_READ \
--source-class org.apache.hudi.utilities.sources.AvroKafkaSource \
--source-ordering-field __null_ts_ms \
--schemaprovider-class org.apache.hudi.utilities.schema.SchemaRegistryProvider \
--target-base-path /user/hdfs/raw_data/public/xxx/yyy \
--target-table xxx \
--hoodie-conf "hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=" \
--hoodie-conf "hoodie.deltastreamer.keygen.timebased.input.timezone=" \
--hoodie-conf "hoodie.upsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.insert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.delete.shuffle.parallelism=2" \
--hoodie-conf "hoodie.bulkinsert.shuffle.parallelism=2" \
--hoodie-conf "hoodie.embed.timeline.server=true" \
--hoodie-conf "hoodie.filesystem.view.type=EMBEDDED_KV_STORE" \
--hoodie-conf "hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions/latest" \
--hoodie-conf "bootstrap.servers=xxx" \
--hoodie-conf "auto.offset.reset=earliest" \
--hoodie-conf "group.id=hudi_group" \
--hoodie-conf "schema.registry.url=http://xxx" \
--hoodie-conf "hoodie.datasource.write.recordkey.field=id" \
--hoodie-conf "hoodie.datasource.write.partitionpath.field=date:TIMESTAMP" \
--hoodie-conf "hoodie.deltastreamer.source.kafka.topic=xxx" \
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848930327
Below is our full log
Logged in as: dr.who
Application
About
Jobs
Tools
Log Type: stderr
Log Upload Time: Wed May 26 18:33:34 +0300 2021
Log Length: 104910
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for TERM
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for HUP
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for INT
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/26 18:33:18 INFO yarn.ApplicationMaster: Preparing Local resources
21/05/26 18:33:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/05/26 18:33:19 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1618828995116_0162_000001
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Waiting for spark context initialization...
21/05/26 18:33:19 WARN deltastreamer.SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
21/05/26 18:33:19 INFO spark.SparkContext: Running Spark version 2.4.7
21/05/26 18:33:19 INFO spark.SparkContext: Submitted application: xxx
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 37691.
21/05/26 18:33:20 INFO spark.SparkEnv: Registering MapOutputTracker
21/05/26 18:33:20 INFO spark.SparkEnv: Registering BlockManagerMaster
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/26 18:33:20 INFO storage.DiskBlockManager: Created local directory at /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/blockmgr-9de167db-4756-414e-9126-32cb562e91aa
21/05/26 18:33:20 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB
21/05/26 18:33:20 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/05/26 18:33:20 INFO util.log: Logging initialized @2935ms
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
21/05/26 18:33:20 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
21/05/26 18:33:20 INFO server.Server: Started @3069ms
21/05/26 18:33:20 INFO server.AbstractConnector: Started ServerConnector@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:32822}
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'SparkUI' on port 32822.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@43837fbc{/jobs,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d91ba30{/jobs/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4854d5d9{/jobs/job,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@672e7ec3{/jobs/job/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67ee182c{/stages,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@97af315{/stages/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1936a0e0{/stages/stage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@447ef19e{/stages/stage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68e36851{/stages/pool,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@352fe12b{/stages/pool/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d39f28d{/storage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e7806b5{/storage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d2a56cb{/storage/rdd,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37c6c6fc{/storage/rdd/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4599e713{/environment,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b9a0cbb{/environment/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24299f0d{/executors,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@25594c52{/executors/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f728695{/executors/threadDump,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7456a814{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1cef9064{/static,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16ba2eda{/,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@dac88e2{/api,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@145850ef{/jobs/job/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d678cf2{/stages/stage/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx:32822
21/05/26 18:33:20 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
21/05/26 18:33:20 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1618828995116_0162 and attemptId Some(appattempt_1618828995116_0162_000001)
21/05/26 18:33:20 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:20 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38417.
21/05/26 18:33:20 INFO netty.NettyBlockTransferService: Server created on xxx:38417
21/05/26 18:33:20 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:38417 with 912.3 MB RAM, BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManager: external shuffle service port = 7337
21/05/26 18:33:20 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c78ce{/metrics/json,null,AVAILABLE,@Spark}
21/05/26 18:33:21 INFO scheduler.EventLoggingListener: Logging events to hdfs://xxx:8020/eventLogging/application_1618828995116_0162_1
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
21/05/26 18:33:21 INFO client.RMProxy: Connecting to ResourceManager at xxx/10.246.4.117:8030
21/05/26 18:33:21 INFO yarn.YarnRMClient: Registering the ApplicationMaster
21/05/26 18:33:21 INFO yarn.ApplicationMaster:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/conf<CPS>/usr/hdp/2.6.0.3-8/hadoop/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
SPARK_USER -> hdfs
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx2048m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.driver.port=37691' \
'-Dspark.ui.port=0' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@xxx:37691 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1618828995116_0162 \
--user-class-path \
file:$PWD/__app__.jar \
--user-class-path \
file:$PWD/org.apache.spark_spark-avro_2.12-2.4.7.jar \
--user-class-path \
file:$PWD/org.spark-project.spark_unused-1.0.0.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
org.apache.spark_spark-avro_2.12-2.4.7.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.apache.spark_spark-avro_2.12-2.4.7.jar" } size: 107269 timestamp: 1622043191967 type: FILE visibility: PRIVATE
__app__.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/jars/hudi/hudi-utilities-bundle_2.12-0.8.0.jar" } size: 40399204 timestamp: 1622022896130 type: FILE visibility: PUBLIC
__spark_conf__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_conf__.zip" } size: 205423 timestamp: 1622043193955 type: ARCHIVE visibility: PRIVATE
org.spark-project.spark_unused-1.0.0.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.spark-project.spark_unused-1.0.0.jar" } size: 2777 timestamp: 1622043192905 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_libs__2858796966972713370.zip" } size: 242613518 timestamp: 1622043190403 type: ARCHIVE visibility: PRIVATE
===============================================================================
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@xxx:37691)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:21 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
21/05/26 18:33:22 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:22 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000002 on host xxx for executor with ID 1
21/05/26 18:33:22 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:25 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.3.9:49980) with ID 1
21/05/26 18:33:25 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
21/05/26 18:33:25 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
21/05/26 18:33:25 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO utilities.UtilHelpers: Adding overridden properties to file properties.
21/05/26 18:33:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
21/05/26 18:33:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:35696 with 912.3 MB RAM, BlockManagerId(1, xxx, 35696, None)
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Creating delta streamer with configs : {hoodie.deltastreamer.keygen.timebased.input.timezone=, hoodie.embed.timeline.server=true, schema.registry.url=http://xxx, hoodie.filesystem.view.type=EMBEDDED_KV_STORE, hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ, hoodie.delete.shuffle.parallelism=2, hoodie.bulkinsert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd, group.id=hudi_group_080, auto.offset.reset=earliest, hoodie.insert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator, hoodie.deltastreamer.source.kafka.topic=xxx, bootstrap.servers=xxx:9092, hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=, hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions
/latest, hoodie.datasource.write.recordkey.field=id, hoodie.upsert.shuffle.parallelism=2, hoodie.datasource.write.partitionpath.field=date:TIMESTAMP}
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Initializing /user/hd_xyz/yyy/ml_xxx/foo as hoodie table /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished initializing Table of type MERGE_ON_READ from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Delta Streamer running only single round
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:26 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:26 INFO deltastreamer.DeltaSync: Checkpoint to resume from : Optional.empty
21/05/26 18:33:26 INFO consumer.ConsumerConfig: ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [xxx]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = hudi_group_080
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer
21/05/26 18:33:26 INFO serializers.KafkaAvroDeserializerConfig: KafkaAvroDeserializerConfig values:
schema.registry.url = [xxx]
max.schemas.per.subject = 1000
specific.avro.reader = false
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.timestamp.type' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.output.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.delete.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.upsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.keygenerator.class' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.schemaprovider.registry.url' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.insert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.embed.timeline.server' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.bulkinsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.timezone' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.filesystem.view.type' was supplied but isn't a known config.
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka version: 2.4.1
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka commitId: c57222ae8cd7866b
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka startTimeMs: 1622043206225
21/05/26 18:33:26 INFO clients.Metadata: [Consumer clientId=consumer-hudi_group_080-1, groupId=hudi_group_080] Cluster ID: 5XoPi9AYT0mbHVQEj6VEaw
21/05/26 18:33:27 INFO helpers.KafkaOffsetGen: SourceLimit not configured, set numEvents to default value : 5000000
21/05/26 18:33:27 INFO sources.AvroKafkaSource: About to read 0 from Kafka for topic :xxx
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: No new data, perform empty commit.
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Setting up new Hoodie Write Client
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Starting Timeline service !!
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Overriding hostIp to (xxx) found in spark-conf. It was null
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :EMBEDDED_KV_STORE
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating embedded rocks-db based Table View
21/05/26 18:33:27 INFO util.log: Logging initialized @9978ms to org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
21/05/26 18:33:27 INFO javalin.Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
21/05/26 18:33:27 INFO javalin.Javalin: Starting Javalin ...
21/05/26 18:33:27 INFO javalin.Javalin: Listening on http://localhost:37089/
21/05/26 18:33:27 INFO javalin.Javalin: Javalin started in 179ms \o/
21/05/26 18:33:27 INFO service.TimelineService: Starting Timeline server on port :37089
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Started embedded timeline server at xxx:37089
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO client.AbstractHoodieClient: Timeline Server already running. Not restarting the service
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210526183328 action: deltacommit
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210526183328__deltacommit__REQUESTED]
21/05/26 18:33:28 INFO deltastreamer.DeltaSync: Starting commit : 20210526183328
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED]]
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:28 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:28 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now
21/05/26 18:33:28 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:114
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 1 (mapToPair at SparkWriteHelper.java:54) as input to shuffle 1
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 5 (countByKey at SparkHoodieBloomIndex.java:114) as input to shuffle 0
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Got job 0 (countByKey at SparkHoodieBloomIndex.java:114) with 2 output partitions
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.2 KB, free 912.3 MB)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
21/05/26 18:33:28 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:28 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.3 KB, free 912.3 MB)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:28 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:28 INFO cluster.YarnClusterScheduler: Adding task set 1.0 with 2 tasks
21/05/26 18:33:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:29 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000004 on host xxx for executor with ID 2
21/05/26 18:33:29 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1023 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 70 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (countByKey at SparkHoodieBloomIndex.java:114) finished in 1.177 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.8 KB, free 912.3 MB)
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 912.3 MB)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Adding task set 2.0 with 2 tasks
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.246.3.9:49980
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 3, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 85 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 3) in 32 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114) finished in 0.126 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Job 0 finished: countByKey at SparkHoodieBloomIndex.java:114, took 1.627903 s
21/05/26 18:33:29 INFO yarn.YarnAllocator: Driver requested a total number of 1 executor(s).
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 1 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (collect at HoodieSparkEngineContext.java:78)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 368.5 KB, free 911.9 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 101.0 KB, free 911.8 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:38417 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 3.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:35696 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 178 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 0.233 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 1 finished: collect at HoodieSparkEngineContext.java:78, took 0.236923 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:73
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 2 (collect at HoodieSparkEngineContext.java:73) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (collect at HoodieSparkEngineContext.java:73)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 368.3 KB, free 911.5 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 100.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:38417 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 4.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 5, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:35696 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 5) in 94 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 4 (collect at HoodieSparkEngineContext.java:73) finished in 0.167 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 2 finished: collect at HoodieSparkEngineContext.java:73, took 0.174163 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:149
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Registering RDD 14 (countByKey at SparkHoodieBloomIndex.java:149) as input to shuffle 2
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 3 (countByKey at SparkHoodieBloomIndex.java:149) with 2 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 7.5 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:38417 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 6.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:35696 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 7, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 6) in 60 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 7) in 36 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.121 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:30 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.8 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 7.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 8, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.246.3.9:49980
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 9, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 8) in 47 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 9) in 20 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 7.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.081 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at SparkHoodieBloomIndex.java:149, took 0.219895 s
21/05/26 18:33:30 INFO bloom.SparkHoodieBloomIndex: InputParallelism: ${2}, IndexParallelism: ${0}
21/05/26 18:33:30 INFO bloom.BucketizedBloomCheckPartitioner: TotalBuckets 0, min_buckets/partition 1
21/05/26 18:33:30 INFO rdd.MapPartitionsRDD: Removing RDD 3 from persistence list
21/05/26 18:33:30 INFO storage.BlockManager: Removing RDD 3
21/05/26 18:33:31 INFO rdd.MapPartitionsRDD: Removing RDD 22 from persistence list
21/05/26 18:33:31 INFO storage.BlockManager: Removing RDD 22
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 16 (mapToPair at SparkHoodieBloomIndex.java:266) as input to shuffle 6
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 23 (mapToPair at SparkHoodieBloomIndex.java:287) as input to shuffle 3
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 22 (flatMapToPair at SparkHoodieBloomIndex.java:274) as input to shuffle 4
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 31 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 5
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Got job 4 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 5.9 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.3 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 10.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 10, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 10.0 (TID 11, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 10) in 50 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 10.0 (TID 11) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 10.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 10 (mapToPair at SparkHoodieBloomIndex.java:287) finished in 0.092 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 12, ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 7.1 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:38417 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 12.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 12, xxx, executor 1, partition 0, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:35696 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 10.246.3.9:49980
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 4 to 10.246.3.9:49980
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_0 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 12.0 (TID 13, xxx, executor 1, partition 1, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 12) in 105 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_1 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 12.0 (TID 13) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 12.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 12 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.146 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 13.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 14, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 5 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 13.0 (TID 15, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 14) in 31 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 13.0 (TID 15) in 12 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 13.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.064 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 4 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.320123 s
21/05/26 18:33:31 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=0, numUpdates=0}, partitionStat={}, operationType=UPSERT}
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.requested
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:31 INFO commit.UpsertPartitioner: AvgRecordSize => 1024
21/05/26 18:33:31 INFO view.AbstractTableFileSystemView: Took 3 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:31 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:31 INFO commit.UpsertPartitioner: Total Buckets :0, buckets info => {},
Partition to insert buckets => {},
UpdateLocations mapped to buckets =>{}
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 175
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 62
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 9
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 148
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 105
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 143
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 55
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 209
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 154
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 147
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 163
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 69
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 34
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 100
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 1
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 193
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 169
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 27
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 16
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 115
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 120
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 106
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 174
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 210
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 96
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 6
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 57
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 133
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 11
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 74
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 107
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 164
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 172
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 176
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 194
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 109
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 37
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 177
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 128
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 182
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 205
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 30
21/05/26 18:33:31 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210526183328
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 102
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 180
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 150
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 186
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 89
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 223
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 47
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 158
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 162
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 88
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 39
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 8
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 29
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 124
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 75
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 165
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 217
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 134
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 35
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 216
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 22
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 114
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 152
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 42
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 94
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 145
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 126
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 144
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 168
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:38417 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:35696 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 149
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 38
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 70
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 15
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 118
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 166
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 207
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 170
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 171
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 65
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 97
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 110
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 222
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 87
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 192
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 201
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 117
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 123
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 12
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 60
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 84
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 127
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 91
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 136
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 45
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 200
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 64
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:38417 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:35696 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 92
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 0
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 81
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 185
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 214
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 21
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 31
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 67
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 112
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 178
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 208
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 78
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 73
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 131
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 61
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 3
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:38417 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:35696 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:448
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 5 finished: sum at DeltaSync.java:448, took 0.000044 s
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 36
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 80
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 103
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 108
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 183
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 72
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 54
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 132
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 99
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 19
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 93
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 179
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 215
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 66
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 77
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 151
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 116
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 191
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 17
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 14
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 18
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 125
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 204
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 146
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 50
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 56
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 52
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 101
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 221
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 213
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 181
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 190
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 85
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 156
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 161
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 53
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 197
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 20
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 41
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 44
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 140
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 218
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 188
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 122
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 195
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 167
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 220
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 43
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 199
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 155
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 24
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 219
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 71
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 198
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 23
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 135
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 26
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 141
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 121
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 157
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 13
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 130
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned shuffle 0
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 7
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 138
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 63
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 187
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 32
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 196
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 48
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 206
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 119
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 160
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 90
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 40
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 113
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 68
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 224
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 28
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 202
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 10
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 139
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 76
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 49
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 137
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 58
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:38417 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:35696 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 4
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 211
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 212
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 83
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 203
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 33
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 86
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 82
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 95
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 142
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 111
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 98
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 184
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 46
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 129
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 104
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 159
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 59
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 25
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 173
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 79
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 153
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 189
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 51
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:449
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 6 finished: sum at DeltaSync.java:449, took 0.000035 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 7 finished: collect at SparkRDDWriteClient.java:120, took 0.000039 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO util.CommitUtils: Creating metadata for UPSERT numWriteStats:0numReplaceFileIds:0
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Committing 20210526183328 action deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Completed [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED], [==>20210526183328__deltacommit__INFLIGHT], [20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO table.HoodieTimelineArchiveLog: No Instants to archive
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210526183332
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hd_xyz/yyy/ml_xxx/foo. Server=xxx:37089, Timeout=300
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:32 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:32 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://xxx:37089/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1)
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO collection.RocksDBDAO: DELETING RocksDB persisted at /tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11
21/05/26 18:33:33 INFO collection.RocksDBDAO: No column family found. Loading default
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:230] Creating manifest 1
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3406] Recovering from manifest file: MANIFEST-000001
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [default]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3610] Recovered from manifest file:/tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3618] Column family [default] (ID 0), log number is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:1287] DB pointer 0x7f3aaccf1f20
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:2936] Creating manifest 6
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo] (ID 1)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo] (ID 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo] (ID 3)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo] (ID 4)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 5)
21/05/26 18:33:33 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.4.117:53684) with ID 2
21/05/26 18:33:33 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 6)
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.AbstractTableFileSystemView: Took 9 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing pending compaction operations. Count=0
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing external data file mapping. Count=0
21/05/26 18:33:33 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting file groups in pending clustering to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Created ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix Search for (query=) on hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo. Total Time Taken (msec)=1. Serialization Time taken(micro)=0, num entries=0
21/05/26 18:33:33 INFO service.RequestHandler: TimeTakenMillis[Total=791, Refresh=779, handle=11, Check=1], Success=true, Query=basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1, Host=xxx:37089, synced=false
21/05/26 18:33:33 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:36920 with 912.3 MB RAM, BlockManagerId(2, xxx, 36920, None)
21/05/26 18:33:33 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !!
21/05/26 18:33:33 INFO clean.CleanPlanner: Nothing to clean here. It is already clean
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaner started
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Committed 20210526183328
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling table service COMPACT
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling compaction at instant time :20210526183333
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO compact.SparkScheduleCompactionActionExecutor: Checking if compaction needs to be run on /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO deltastreamer.DeltaSync: Commit 20210526183328 successful!
21/05/26 18:33:33 INFO rdd.MapPartitionsRDD: Removing RDD 29 from persistence list
21/05/26 18:33:33 INFO storage.BlockManager: Removing RDD 29
21/05/26 18:33:34 INFO rdd.MapPartitionsRDD: Removing RDD 37 from persistence list
21/05/26 18:33:34 INFO storage.BlockManager: Removing RDD 37
21/05/26 18:33:34 INFO deltastreamer.DeltaSync: Shutting down embedded timeline server
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closing Timeline server
21/05/26 18:33:34 INFO service.TimelineService: Closing Timeline Service
21/05/26 18:33:34 INFO javalin.Javalin: Stopping Javalin ...
21/05/26 18:33:34 INFO javalin.Javalin: Javalin has stopped
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closing Rocksdb !!
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:365] Shutdown: canceling all background work
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:521] Shutdown complete
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closed Rocksdb !!
21/05/26 18:33:34 INFO service.TimelineService: Closed Timeline Service
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closed Timeline server
21/05/26 18:33:34 INFO deltastreamer.HoodieDeltaStreamer: Shut down delta streamer
21/05/26 18:33:34 INFO server.AbstractConnector: Stopped Spark@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
21/05/26 18:33:34 INFO ui.SparkUI: Stopped Spark web UI at http://xxx:32822
21/05/26 18:33:34 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
21/05/26 18:33:34 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
21/05/26 18:33:34 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
21/05/26 18:33:34 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
21/05/26 18:33:34 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/05/26 18:33:34 INFO memory.MemoryStore: MemoryStore cleared
21/05/26 18:33:34 INFO storage.BlockManager: BlockManager stopped
21/05/26 18:33:34 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/05/26 18:33:34 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/05/26 18:33:34 INFO spark.SparkContext: Successfully stopped SparkContext
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
21/05/26 18:33:34 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
21/05/26 18:33:34 INFO util.ShutdownHookManager: Shutdown hook called
21/05/26 18:33:34 INFO util.ShutdownHookManager: Deleting directory /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/spark-4c7e81b9-e526-4325-abf0-d163828b92b5
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov removed a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov removed a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848891756
This is our full log
[spark_log.txt](https://github.com/apache/hudi/files/6548390/spark_log.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov commented on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov commented on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848891756
There is no exceptions or errors in the logs, all warnings I found are the below
21/05/26 18:33:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/26 18:33:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/05/26 18:33:19 WARN deltastreamer.SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
21/05/26 18:33:20 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
21/05/26 18:33:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.timestamp.type' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.output.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.delete.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.upsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.keygenerator.class' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.schemaprovider.registry.url' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.insert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.embed.timeline.server' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.bulkinsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.timezone' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.filesystem.view.type' was supplied but isn't a known config.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov commented on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov commented on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848885930
.hoodie directory structure is the following
hdfs dfs -ls /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie
Found 7 items
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/.aux
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/.temp
-rw-r--r-- 3 hdfs hadoop 1201 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/20210526183328.deltacommit
-rw-r--r-- 3 hdfs hadoop 518 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/20210526183328.deltacommit.inflight
-rw-r--r-- 3 hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/20210526183328.deltacommit.requested
drwxr-xr-x - hdfs hadoop 0 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/archived
-rw-r--r-- 3 hdfs hadoop 391 2021-05-26 18:33 /user/hdfs/raw_data/public/ml_training_data/foo/.hoodie/hoodie.properties
[](url)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848930327
"""
Logged in as: dr.who
Application
About
Jobs
Tools
Log Type: stderr
Log Upload Time: Wed May 26 18:33:34 +0300 2021
Log Length: 104910
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for TERM
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for HUP
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for INT
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/26 18:33:18 INFO yarn.ApplicationMaster: Preparing Local resources
21/05/26 18:33:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/05/26 18:33:19 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1618828995116_0162_000001
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Waiting for spark context initialization...
21/05/26 18:33:19 WARN deltastreamer.SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
21/05/26 18:33:19 INFO spark.SparkContext: Running Spark version 2.4.7
21/05/26 18:33:19 INFO spark.SparkContext: Submitted application: xxx
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 37691.
21/05/26 18:33:20 INFO spark.SparkEnv: Registering MapOutputTracker
21/05/26 18:33:20 INFO spark.SparkEnv: Registering BlockManagerMaster
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/26 18:33:20 INFO storage.DiskBlockManager: Created local directory at /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/blockmgr-9de167db-4756-414e-9126-32cb562e91aa
21/05/26 18:33:20 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB
21/05/26 18:33:20 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/05/26 18:33:20 INFO util.log: Logging initialized @2935ms
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
21/05/26 18:33:20 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
21/05/26 18:33:20 INFO server.Server: Started @3069ms
21/05/26 18:33:20 INFO server.AbstractConnector: Started ServerConnector@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:32822}
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'SparkUI' on port 32822.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@43837fbc{/jobs,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d91ba30{/jobs/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4854d5d9{/jobs/job,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@672e7ec3{/jobs/job/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67ee182c{/stages,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@97af315{/stages/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1936a0e0{/stages/stage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@447ef19e{/stages/stage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68e36851{/stages/pool,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@352fe12b{/stages/pool/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d39f28d{/storage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e7806b5{/storage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d2a56cb{/storage/rdd,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37c6c6fc{/storage/rdd/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4599e713{/environment,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b9a0cbb{/environment/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24299f0d{/executors,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@25594c52{/executors/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f728695{/executors/threadDump,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7456a814{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1cef9064{/static,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16ba2eda{/,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@dac88e2{/api,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@145850ef{/jobs/job/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d678cf2{/stages/stage/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx:32822
21/05/26 18:33:20 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
21/05/26 18:33:20 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1618828995116_0162 and attemptId Some(appattempt_1618828995116_0162_000001)
21/05/26 18:33:20 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:20 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38417.
21/05/26 18:33:20 INFO netty.NettyBlockTransferService: Server created on xxx:38417
21/05/26 18:33:20 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:38417 with 912.3 MB RAM, BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManager: external shuffle service port = 7337
21/05/26 18:33:20 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c78ce{/metrics/json,null,AVAILABLE,@Spark}
21/05/26 18:33:21 INFO scheduler.EventLoggingListener: Logging events to hdfs://xxx:8020/eventLogging/application_1618828995116_0162_1
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
21/05/26 18:33:21 INFO client.RMProxy: Connecting to ResourceManager at xxx/10.246.4.117:8030
21/05/26 18:33:21 INFO yarn.YarnRMClient: Registering the ApplicationMaster
21/05/26 18:33:21 INFO yarn.ApplicationMaster:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/conf<CPS>/usr/hdp/2.6.0.3-8/hadoop/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
SPARK_USER -> hdfs
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx2048m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.driver.port=37691' \
'-Dspark.ui.port=0' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@xxx:37691 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1618828995116_0162 \
--user-class-path \
file:$PWD/__app__.jar \
--user-class-path \
file:$PWD/org.apache.spark_spark-avro_2.12-2.4.7.jar \
--user-class-path \
file:$PWD/org.spark-project.spark_unused-1.0.0.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
org.apache.spark_spark-avro_2.12-2.4.7.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.apache.spark_spark-avro_2.12-2.4.7.jar" } size: 107269 timestamp: 1622043191967 type: FILE visibility: PRIVATE
__app__.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/jars/hudi/hudi-utilities-bundle_2.12-0.8.0.jar" } size: 40399204 timestamp: 1622022896130 type: FILE visibility: PUBLIC
__spark_conf__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_conf__.zip" } size: 205423 timestamp: 1622043193955 type: ARCHIVE visibility: PRIVATE
org.spark-project.spark_unused-1.0.0.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.spark-project.spark_unused-1.0.0.jar" } size: 2777 timestamp: 1622043192905 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_libs__2858796966972713370.zip" } size: 242613518 timestamp: 1622043190403 type: ARCHIVE visibility: PRIVATE
===============================================================================
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@xxx:37691)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:21 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
21/05/26 18:33:22 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:22 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000002 on host xxx for executor with ID 1
21/05/26 18:33:22 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:25 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.3.9:49980) with ID 1
21/05/26 18:33:25 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
21/05/26 18:33:25 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
21/05/26 18:33:25 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO utilities.UtilHelpers: Adding overridden properties to file properties.
21/05/26 18:33:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
21/05/26 18:33:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:35696 with 912.3 MB RAM, BlockManagerId(1, xxx, 35696, None)
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Creating delta streamer with configs : {hoodie.deltastreamer.keygen.timebased.input.timezone=, hoodie.embed.timeline.server=true, schema.registry.url=http://xxx, hoodie.filesystem.view.type=EMBEDDED_KV_STORE, hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ, hoodie.delete.shuffle.parallelism=2, hoodie.bulkinsert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd, group.id=hudi_group_080, auto.offset.reset=earliest, hoodie.insert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator, hoodie.deltastreamer.source.kafka.topic=xxx, bootstrap.servers=xxx:9092, hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=, hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions
/latest, hoodie.datasource.write.recordkey.field=id, hoodie.upsert.shuffle.parallelism=2, hoodie.datasource.write.partitionpath.field=date:TIMESTAMP}
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Initializing /user/hd_xyz/yyy/ml_xxx/foo as hoodie table /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished initializing Table of type MERGE_ON_READ from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Delta Streamer running only single round
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:26 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:26 INFO deltastreamer.DeltaSync: Checkpoint to resume from : Optional.empty
21/05/26 18:33:26 INFO consumer.ConsumerConfig: ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [xxx]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = hudi_group_080
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer
21/05/26 18:33:26 INFO serializers.KafkaAvroDeserializerConfig: KafkaAvroDeserializerConfig values:
schema.registry.url = [xxx]
max.schemas.per.subject = 1000
specific.avro.reader = false
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.timestamp.type' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.output.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.delete.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.upsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.keygenerator.class' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.schemaprovider.registry.url' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.insert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.embed.timeline.server' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.bulkinsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.timezone' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.filesystem.view.type' was supplied but isn't a known config.
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka version: 2.4.1
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka commitId: c57222ae8cd7866b
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka startTimeMs: 1622043206225
21/05/26 18:33:26 INFO clients.Metadata: [Consumer clientId=consumer-hudi_group_080-1, groupId=hudi_group_080] Cluster ID: 5XoPi9AYT0mbHVQEj6VEaw
21/05/26 18:33:27 INFO helpers.KafkaOffsetGen: SourceLimit not configured, set numEvents to default value : 5000000
21/05/26 18:33:27 INFO sources.AvroKafkaSource: About to read 0 from Kafka for topic :xxx
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: No new data, perform empty commit.
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Setting up new Hoodie Write Client
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Starting Timeline service !!
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Overriding hostIp to (xxx) found in spark-conf. It was null
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :EMBEDDED_KV_STORE
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating embedded rocks-db based Table View
21/05/26 18:33:27 INFO util.log: Logging initialized @9978ms to org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
21/05/26 18:33:27 INFO javalin.Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
21/05/26 18:33:27 INFO javalin.Javalin: Starting Javalin ...
21/05/26 18:33:27 INFO javalin.Javalin: Listening on http://localhost:37089/
21/05/26 18:33:27 INFO javalin.Javalin: Javalin started in 179ms \o/
21/05/26 18:33:27 INFO service.TimelineService: Starting Timeline server on port :37089
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Started embedded timeline server at xxx:37089
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO client.AbstractHoodieClient: Timeline Server already running. Not restarting the service
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210526183328 action: deltacommit
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210526183328__deltacommit__REQUESTED]
21/05/26 18:33:28 INFO deltastreamer.DeltaSync: Starting commit : 20210526183328
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED]]
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:28 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:28 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now
21/05/26 18:33:28 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:114
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 1 (mapToPair at SparkWriteHelper.java:54) as input to shuffle 1
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 5 (countByKey at SparkHoodieBloomIndex.java:114) as input to shuffle 0
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Got job 0 (countByKey at SparkHoodieBloomIndex.java:114) with 2 output partitions
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.2 KB, free 912.3 MB)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
21/05/26 18:33:28 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:28 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.3 KB, free 912.3 MB)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:28 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:28 INFO cluster.YarnClusterScheduler: Adding task set 1.0 with 2 tasks
21/05/26 18:33:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:29 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000004 on host xxx for executor with ID 2
21/05/26 18:33:29 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1023 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 70 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (countByKey at SparkHoodieBloomIndex.java:114) finished in 1.177 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.8 KB, free 912.3 MB)
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 912.3 MB)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Adding task set 2.0 with 2 tasks
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.246.3.9:49980
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 3, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 85 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 3) in 32 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114) finished in 0.126 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Job 0 finished: countByKey at SparkHoodieBloomIndex.java:114, took 1.627903 s
21/05/26 18:33:29 INFO yarn.YarnAllocator: Driver requested a total number of 1 executor(s).
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 1 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (collect at HoodieSparkEngineContext.java:78)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 368.5 KB, free 911.9 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 101.0 KB, free 911.8 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:38417 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 3.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:35696 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 178 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 0.233 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 1 finished: collect at HoodieSparkEngineContext.java:78, took 0.236923 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:73
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 2 (collect at HoodieSparkEngineContext.java:73) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (collect at HoodieSparkEngineContext.java:73)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 368.3 KB, free 911.5 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 100.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:38417 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 4.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 5, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:35696 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 5) in 94 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 4 (collect at HoodieSparkEngineContext.java:73) finished in 0.167 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 2 finished: collect at HoodieSparkEngineContext.java:73, took 0.174163 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:149
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Registering RDD 14 (countByKey at SparkHoodieBloomIndex.java:149) as input to shuffle 2
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 3 (countByKey at SparkHoodieBloomIndex.java:149) with 2 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 7.5 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:38417 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 6.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:35696 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 7, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 6) in 60 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 7) in 36 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.121 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:30 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.8 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 7.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 8, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.246.3.9:49980
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 9, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 8) in 47 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 9) in 20 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 7.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.081 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at SparkHoodieBloomIndex.java:149, took 0.219895 s
21/05/26 18:33:30 INFO bloom.SparkHoodieBloomIndex: InputParallelism: ${2}, IndexParallelism: ${0}
21/05/26 18:33:30 INFO bloom.BucketizedBloomCheckPartitioner: TotalBuckets 0, min_buckets/partition 1
21/05/26 18:33:30 INFO rdd.MapPartitionsRDD: Removing RDD 3 from persistence list
21/05/26 18:33:30 INFO storage.BlockManager: Removing RDD 3
21/05/26 18:33:31 INFO rdd.MapPartitionsRDD: Removing RDD 22 from persistence list
21/05/26 18:33:31 INFO storage.BlockManager: Removing RDD 22
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 16 (mapToPair at SparkHoodieBloomIndex.java:266) as input to shuffle 6
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 23 (mapToPair at SparkHoodieBloomIndex.java:287) as input to shuffle 3
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 22 (flatMapToPair at SparkHoodieBloomIndex.java:274) as input to shuffle 4
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 31 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 5
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Got job 4 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 5.9 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.3 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 10.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 10, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 10.0 (TID 11, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 10) in 50 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 10.0 (TID 11) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 10.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 10 (mapToPair at SparkHoodieBloomIndex.java:287) finished in 0.092 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 12, ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 7.1 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:38417 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 12.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 12, xxx, executor 1, partition 0, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:35696 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 10.246.3.9:49980
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 4 to 10.246.3.9:49980
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_0 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 12.0 (TID 13, xxx, executor 1, partition 1, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 12) in 105 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_1 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 12.0 (TID 13) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 12.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 12 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.146 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 13.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 14, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 5 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 13.0 (TID 15, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 14) in 31 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 13.0 (TID 15) in 12 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 13.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.064 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 4 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.320123 s
21/05/26 18:33:31 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=0, numUpdates=0}, partitionStat={}, operationType=UPSERT}
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.requested
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:31 INFO commit.UpsertPartitioner: AvgRecordSize => 1024
21/05/26 18:33:31 INFO view.AbstractTableFileSystemView: Took 3 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:31 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:31 INFO commit.UpsertPartitioner: Total Buckets :0, buckets info => {},
Partition to insert buckets => {},
UpdateLocations mapped to buckets =>{}
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 175
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 62
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 9
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 148
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 105
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 143
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 55
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 209
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 154
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 147
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 163
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 69
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 34
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 100
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 1
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 193
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 169
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 27
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 16
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 115
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 120
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 106
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 174
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 210
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 96
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 6
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 57
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 133
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 11
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 74
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 107
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 164
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 172
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 176
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 194
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 109
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 37
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 177
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 128
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 182
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 205
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 30
21/05/26 18:33:31 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210526183328
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 102
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 180
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 150
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 186
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 89
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 223
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 47
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 158
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 162
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 88
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 39
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 8
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 29
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 124
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 75
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 165
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 217
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 134
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 35
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 216
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 22
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 114
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 152
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 42
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 94
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 145
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 126
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 144
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 168
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:38417 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:35696 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 149
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 38
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 70
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 15
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 118
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 166
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 207
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 170
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 171
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 65
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 97
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 110
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 222
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 87
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 192
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 201
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 117
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 123
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 12
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 60
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 84
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 127
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 91
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 136
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 45
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 200
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 64
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:38417 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:35696 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 92
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 0
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 81
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 185
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 214
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 21
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 31
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 67
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 112
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 178
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 208
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 78
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 73
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 131
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 61
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 3
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:38417 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:35696 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:448
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 5 finished: sum at DeltaSync.java:448, took 0.000044 s
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 36
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 80
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 103
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 108
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 183
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 72
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 54
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 132
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 99
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 19
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 93
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 179
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 215
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 66
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 77
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 151
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 116
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 191
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 17
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 14
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 18
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 125
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 204
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 146
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 50
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 56
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 52
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 101
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 221
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 213
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 181
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 190
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 85
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 156
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 161
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 53
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 197
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 20
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 41
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 44
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 140
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 218
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 188
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 122
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 195
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 167
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 220
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 43
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 199
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 155
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 24
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 219
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 71
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 198
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 23
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 135
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 26
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 141
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 121
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 157
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 13
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 130
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned shuffle 0
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 7
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 138
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 63
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 187
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 32
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 196
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 48
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 206
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 119
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 160
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 90
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 40
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 113
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 68
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 224
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 28
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 202
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 10
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 139
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 76
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 49
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 137
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 58
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:38417 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:35696 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 4
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 211
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 212
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 83
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 203
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 33
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 86
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 82
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 95
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 142
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 111
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 98
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 184
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 46
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 129
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 104
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 159
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 59
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 25
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 173
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 79
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 153
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 189
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 51
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:449
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 6 finished: sum at DeltaSync.java:449, took 0.000035 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 7 finished: collect at SparkRDDWriteClient.java:120, took 0.000039 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO util.CommitUtils: Creating metadata for UPSERT numWriteStats:0numReplaceFileIds:0
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Committing 20210526183328 action deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Completed [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED], [==>20210526183328__deltacommit__INFLIGHT], [20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO table.HoodieTimelineArchiveLog: No Instants to archive
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210526183332
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hd_xyz/yyy/ml_xxx/foo. Server=xxx:37089, Timeout=300
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:32 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:32 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://xxx:37089/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1)
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO collection.RocksDBDAO: DELETING RocksDB persisted at /tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11
21/05/26 18:33:33 INFO collection.RocksDBDAO: No column family found. Loading default
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:230] Creating manifest 1
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3406] Recovering from manifest file: MANIFEST-000001
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [default]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3610] Recovered from manifest file:/tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3618] Column family [default] (ID 0), log number is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:1287] DB pointer 0x7f3aaccf1f20
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:2936] Creating manifest 6
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo] (ID 1)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo] (ID 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo] (ID 3)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo] (ID 4)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 5)
21/05/26 18:33:33 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.4.117:53684) with ID 2
21/05/26 18:33:33 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 6)
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.AbstractTableFileSystemView: Took 9 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing pending compaction operations. Count=0
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing external data file mapping. Count=0
21/05/26 18:33:33 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting file groups in pending clustering to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Created ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix Search for (query=) on hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo. Total Time Taken (msec)=1. Serialization Time taken(micro)=0, num entries=0
21/05/26 18:33:33 INFO service.RequestHandler: TimeTakenMillis[Total=791, Refresh=779, handle=11, Check=1], Success=true, Query=basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1, Host=xxx:37089, synced=false
21/05/26 18:33:33 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:36920 with 912.3 MB RAM, BlockManagerId(2, xxx, 36920, None)
21/05/26 18:33:33 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !!
21/05/26 18:33:33 INFO clean.CleanPlanner: Nothing to clean here. It is already clean
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaner started
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Committed 20210526183328
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling table service COMPACT
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling compaction at instant time :20210526183333
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO compact.SparkScheduleCompactionActionExecutor: Checking if compaction needs to be run on /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO deltastreamer.DeltaSync: Commit 20210526183328 successful!
21/05/26 18:33:33 INFO rdd.MapPartitionsRDD: Removing RDD 29 from persistence list
21/05/26 18:33:33 INFO storage.BlockManager: Removing RDD 29
21/05/26 18:33:34 INFO rdd.MapPartitionsRDD: Removing RDD 37 from persistence list
21/05/26 18:33:34 INFO storage.BlockManager: Removing RDD 37
21/05/26 18:33:34 INFO deltastreamer.DeltaSync: Shutting down embedded timeline server
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closing Timeline server
21/05/26 18:33:34 INFO service.TimelineService: Closing Timeline Service
21/05/26 18:33:34 INFO javalin.Javalin: Stopping Javalin ...
21/05/26 18:33:34 INFO javalin.Javalin: Javalin has stopped
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closing Rocksdb !!
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:365] Shutdown: canceling all background work
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:521] Shutdown complete
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closed Rocksdb !!
21/05/26 18:33:34 INFO service.TimelineService: Closed Timeline Service
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closed Timeline server
21/05/26 18:33:34 INFO deltastreamer.HoodieDeltaStreamer: Shut down delta streamer
21/05/26 18:33:34 INFO server.AbstractConnector: Stopped Spark@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
21/05/26 18:33:34 INFO ui.SparkUI: Stopped Spark web UI at http://xxx:32822
21/05/26 18:33:34 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
21/05/26 18:33:34 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
21/05/26 18:33:34 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
21/05/26 18:33:34 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
21/05/26 18:33:34 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/05/26 18:33:34 INFO memory.MemoryStore: MemoryStore cleared
21/05/26 18:33:34 INFO storage.BlockManager: BlockManager stopped
21/05/26 18:33:34 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/05/26 18:33:34 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/05/26 18:33:34 INFO spark.SparkContext: Successfully stopped SparkContext
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
21/05/26 18:33:34 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
21/05/26 18:33:34 INFO util.ShutdownHookManager: Shutdown hook called
21/05/26 18:33:34 INFO util.ShutdownHookManager: Deleting directory /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/spark-4c7e81b9-e526-4325-abf0-d163828b92b5
"""
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848930327
This is our full log:
[spark_log.txt](https://github.com/apache/hudi/files/6548394/spark_log.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848930327
`
Logged in as: dr.who
Application
About
Jobs
Tools
Log Type: stderr
Log Upload Time: Wed May 26 18:33:34 +0300 2021
Log Length: 104910
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for TERM
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for HUP
21/05/26 18:33:18 INFO util.SignalUtils: Registered signal handler for INT
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:18 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:18 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/05/26 18:33:18 INFO yarn.ApplicationMaster: Preparing Local resources
21/05/26 18:33:19 WARN shortcircuit.DomainSocketFactory: The short-circuit local reads feature cannot be used because libhadoop cannot be loaded.
21/05/26 18:33:19 INFO yarn.ApplicationMaster: ApplicationAttemptId: appattempt_1618828995116_0162_000001
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Starting the user application in a separate Thread
21/05/26 18:33:19 INFO yarn.ApplicationMaster: Waiting for spark context initialization...
21/05/26 18:33:19 WARN deltastreamer.SchedulerConfGenerator: Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
21/05/26 18:33:19 INFO spark.SparkContext: Running Spark version 2.4.7
21/05/26 18:33:19 INFO spark.SparkContext: Submitted application: xxx
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls to: yarn,hdfs
21/05/26 18:33:19 INFO spark.SecurityManager: Changing view acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: Changing modify acls groups to:
21/05/26 18:33:19 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(yarn, hdfs); groups with view permissions: Set(); users with modify permissions: Set(yarn, hdfs); groups with modify permissions: Set()
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'sparkDriver' on port 37691.
21/05/26 18:33:20 INFO spark.SparkEnv: Registering MapOutputTracker
21/05/26 18:33:20 INFO spark.SparkEnv: Registering BlockManagerMaster
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/05/26 18:33:20 INFO storage.DiskBlockManager: Created local directory at /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/blockmgr-9de167db-4756-414e-9126-32cb562e91aa
21/05/26 18:33:20 INFO memory.MemoryStore: MemoryStore started with capacity 912.3 MB
21/05/26 18:33:20 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/05/26 18:33:20 INFO util.log: Logging initialized @2935ms
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /jobs, /jobs/json, /jobs/job, /jobs/job/json, /stages, /stages/json, /stages/stage, /stages/stage/json, /stages/pool, /stages/pool/json, /storage, /storage/json, /storage/rdd, /storage/rdd/json, /environment, /environment/json, /executors, /executors/json, /executors/threadDump, /executors/threadDump/json, /static, /, /api, /jobs/job/kill, /stages/stage/kill.
21/05/26 18:33:20 INFO server.Server: jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown
21/05/26 18:33:20 INFO server.Server: Started @3069ms
21/05/26 18:33:20 INFO server.AbstractConnector: Started ServerConnector@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:32822}
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'SparkUI' on port 32822.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@43837fbc{/jobs,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@d91ba30{/jobs/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4854d5d9{/jobs/job,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@672e7ec3{/jobs/job/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@67ee182c{/stages,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@97af315{/stages/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1936a0e0{/stages/stage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@447ef19e{/stages/stage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68e36851{/stages/pool,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@352fe12b{/stages/pool/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3d39f28d{/storage,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@e7806b5{/storage/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7d2a56cb{/storage/rdd,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@37c6c6fc{/storage/rdd/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4599e713{/environment,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b9a0cbb{/environment/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@24299f0d{/executors,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@25594c52{/executors/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2f728695{/executors/threadDump,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7456a814{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1cef9064{/static,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16ba2eda{/,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@dac88e2{/api,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@145850ef{/jobs/job/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6d678cf2{/stages/stage/kill,null,AVAILABLE,@Spark}
21/05/26 18:33:20 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://xxx:32822
21/05/26 18:33:20 INFO cluster.YarnClusterScheduler: Created YarnClusterScheduler
21/05/26 18:33:20 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1618828995116_0162 and attemptId Some(appattempt_1618828995116_0162_000001)
21/05/26 18:33:20 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:20 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:20 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 38417.
21/05/26 18:33:20 INFO netty.NettyBlockTransferService: Server created on xxx:38417
21/05/26 18:33:20 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:38417 with 912.3 MB RAM, BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO storage.BlockManager: external shuffle service port = 7337
21/05/26 18:33:20 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, xxx, 38417, None)
21/05/26 18:33:20 INFO ui.JettyUtils: Adding filter org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter to /metrics/json.
21/05/26 18:33:20 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1b3c78ce{/metrics/json,null,AVAILABLE,@Spark}
21/05/26 18:33:21 INFO scheduler.EventLoggingListener: Logging events to hdfs://xxx:8020/eventLogging/application_1618828995116_0162_1
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Attempted to request executors before the AM has registered!
21/05/26 18:33:21 INFO client.RMProxy: Connecting to ResourceManager at xxx/10.246.4.117:8030
21/05/26 18:33:21 INFO yarn.YarnRMClient: Registering the ApplicationMaster
21/05/26 18:33:21 INFO yarn.ApplicationMaster:
===============================================================================
YARN executor launch context:
env:
CLASSPATH -> {{PWD}}<CPS>{{PWD}}/__spark_conf__<CPS>{{PWD}}/__spark_libs__/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/conf<CPS>/usr/hdp/2.6.0.3-8/hadoop/*<CPS>/usr/hdp/2.6.0.3-8/hadoop/lib/*<CPS>/usr/hdp/current/hadoop-hdfs-client/*<CPS>/usr/hdp/current/hadoop-hdfs-client/lib/*<CPS>/usr/hdp/current/hadoop-yarn-client/*<CPS>/usr/hdp/current/hadoop-yarn-client/lib/*<CPS>/usr/hdp/current/ext/hadoop/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*<CPS>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*<CPS>{{PWD}}/__spark_conf__/__hadoop_conf__
SPARK_YARN_STAGING_DIR -> hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
SPARK_USER -> hdfs
command:
{{JAVA_HOME}}/bin/java \
-server \
-Xmx2048m \
-Djava.io.tmpdir={{PWD}}/tmp \
'-Dspark.driver.port=37691' \
'-Dspark.ui.port=0' \
-Dspark.yarn.app.container.log.dir=<LOG_DIR> \
-XX:OnOutOfMemoryError='kill %p' \
org.apache.spark.executor.CoarseGrainedExecutorBackend \
--driver-url \
spark://CoarseGrainedScheduler@xxx:37691 \
--executor-id \
<executorId> \
--hostname \
<hostname> \
--cores \
1 \
--app-id \
application_1618828995116_0162 \
--user-class-path \
file:$PWD/__app__.jar \
--user-class-path \
file:$PWD/org.apache.spark_spark-avro_2.12-2.4.7.jar \
--user-class-path \
file:$PWD/org.spark-project.spark_unused-1.0.0.jar \
1><LOG_DIR>/stdout \
2><LOG_DIR>/stderr
resources:
org.apache.spark_spark-avro_2.12-2.4.7.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.apache.spark_spark-avro_2.12-2.4.7.jar" } size: 107269 timestamp: 1622043191967 type: FILE visibility: PRIVATE
__app__.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/jars/hudi/hudi-utilities-bundle_2.12-0.8.0.jar" } size: 40399204 timestamp: 1622022896130 type: FILE visibility: PUBLIC
__spark_conf__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_conf__.zip" } size: 205423 timestamp: 1622043193955 type: ARCHIVE visibility: PRIVATE
org.spark-project.spark_unused-1.0.0.jar -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/org.spark-project.spark_unused-1.0.0.jar" } size: 2777 timestamp: 1622043192905 type: FILE visibility: PRIVATE
__spark_libs__ -> resource { scheme: "hdfs" host: "xxx" port: 8020 file: "/user/hd_xyz/.sparkStaging/application_1618828995116_0162/__spark_libs__2858796966972713370.zip" } size: 242613518 timestamp: 1622043190403 type: ARCHIVE visibility: PRIVATE
===============================================================================
21/05/26 18:33:21 WARN util.Utils: spark.executor.instances less than spark.dynamicAllocation.minExecutors is invalid, ignoring its setting, please update your configs.
21/05/26 18:33:21 INFO util.Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances
21/05/26 18:33:21 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark://YarnAM@xxx:37691)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:21 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:21 INFO yarn.ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
21/05/26 18:33:22 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:22 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000002 on host xxx for executor with ID 1
21/05/26 18:33:22 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:22 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:25 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.3.9:49980) with ID 1
21/05/26 18:33:25 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1)
21/05/26 18:33:25 INFO cluster.YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
21/05/26 18:33:25 INFO cluster.YarnClusterScheduler: YarnClusterScheduler.postStartHook done
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO utilities.UtilHelpers: Adding overridden properties to file properties.
21/05/26 18:33:25 WARN spark.SparkContext: Using an existing SparkContext; some configuration may not take effect.
21/05/26 18:33:25 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:35696 with 912.3 MB RAM, BlockManagerId(1, xxx, 35696, None)
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Creating delta streamer with configs : {hoodie.deltastreamer.keygen.timebased.input.timezone=, hoodie.embed.timeline.server=true, schema.registry.url=http://xxx, hoodie.filesystem.view.type=EMBEDDED_KV_STORE, hoodie.deltastreamer.keygen.timebased.input.dateformat=yyyy-MM-ddTHH:mm:ssZ,yyyy-MM-ddTHH:mm:ss.SSSZ, hoodie.delete.shuffle.parallelism=2, hoodie.bulkinsert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.output.dateformat=yyyy/MM/dd, group.id=hudi_group_080, auto.offset.reset=earliest, hoodie.insert.shuffle.parallelism=2, hoodie.deltastreamer.keygen.timebased.timestamp.type=DATE_STRING, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.CustomKeyGenerator, hoodie.deltastreamer.source.kafka.topic=xxx, bootstrap.servers=xxx:9092, hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex=, hoodie.deltastreamer.schemaprovider.registry.url=http://xxx/subjects/xxx-value/versions
/latest, hoodie.datasource.write.recordkey.field=id, hoodie.upsert.shuffle.parallelism=2, hoodie.datasource.write.partitionpath.field=date:TIMESTAMP}
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Initializing /user/hd_xyz/yyy/ml_xxx/foo as hoodie table /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished initializing Table of type MERGE_ON_READ from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:25 INFO deltastreamer.HoodieDeltaStreamer: Delta Streamer running only single round
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:25 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:25 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:25 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:26 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:26 INFO deltastreamer.DeltaSync: Checkpoint to resume from : Optional.empty
21/05/26 18:33:26 INFO consumer.ConsumerConfig: ConsumerConfig values:
allow.auto.create.topics = true
auto.commit.interval.ms = 5000
auto.offset.reset = earliest
bootstrap.servers = [xxx]
check.crcs = true
client.dns.lookup = default
client.id =
client.rack =
connections.max.idle.ms = 540000
default.api.timeout.ms = 60000
enable.auto.commit = true
exclude.internal.topics = true
fetch.max.bytes = 52428800
fetch.max.wait.ms = 500
fetch.min.bytes = 1
group.id = hudi_group_080
group.instance.id = null
heartbeat.interval.ms = 3000
interceptor.classes = []
internal.leave.group.on.close = true
isolation.level = read_uncommitted
key.deserializer = class org.apache.kafka.common.serialization.StringDeserializer
max.partition.fetch.bytes = 1048576
max.poll.interval.ms = 300000
max.poll.records = 500
metadata.max.age.ms = 300000
metric.reporters = []
metrics.num.samples = 2
metrics.recording.level = INFO
metrics.sample.window.ms = 30000
partition.assignment.strategy = [class org.apache.kafka.clients.consumer.RangeAssignor]
receive.buffer.bytes = 65536
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
retry.backoff.ms = 100
sasl.client.callback.handler.class = null
sasl.jaas.config = null
sasl.kerberos.kinit.cmd = /usr/bin/kinit
sasl.kerberos.min.time.before.relogin = 60000
sasl.kerberos.service.name = null
sasl.kerberos.ticket.renew.jitter = 0.05
sasl.kerberos.ticket.renew.window.factor = 0.8
sasl.login.callback.handler.class = null
sasl.login.class = null
sasl.login.refresh.buffer.seconds = 300
sasl.login.refresh.min.period.seconds = 60
sasl.login.refresh.window.factor = 0.8
sasl.login.refresh.window.jitter = 0.05
sasl.mechanism = GSSAPI
security.protocol = PLAINTEXT
security.providers = null
send.buffer.bytes = 131072
session.timeout.ms = 10000
ssl.cipher.suites = null
ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
ssl.endpoint.identification.algorithm = https
ssl.key.password = null
ssl.keymanager.algorithm = SunX509
ssl.keystore.location = null
ssl.keystore.password = null
ssl.keystore.type = JKS
ssl.protocol = TLS
ssl.provider = null
ssl.secure.random.implementation = null
ssl.trustmanager.algorithm = PKIX
ssl.truststore.location = null
ssl.truststore.password = null
ssl.truststore.type = JKS
value.deserializer = class io.confluent.kafka.serializers.KafkaAvroDeserializer
21/05/26 18:33:26 INFO serializers.KafkaAvroDeserializerConfig: KafkaAvroDeserializerConfig values:
schema.registry.url = [xxx]
max.schemas.per.subject = 1000
specific.avro.reader = false
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.timestamp.type' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.output.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat.list.delimiter.regex' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.dateformat' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.partitionpath.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.delete.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.recordkey.field' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.upsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.datasource.write.keygenerator.class' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.source.kafka.topic' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.schemaprovider.registry.url' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.insert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.embed.timeline.server' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.bulkinsert.shuffle.parallelism' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.deltastreamer.keygen.timebased.input.timezone' was supplied but isn't a known config.
21/05/26 18:33:26 WARN consumer.ConsumerConfig: The configuration 'hoodie.filesystem.view.type' was supplied but isn't a known config.
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka version: 2.4.1
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka commitId: c57222ae8cd7866b
21/05/26 18:33:26 INFO utils.AppInfoParser: Kafka startTimeMs: 1622043206225
21/05/26 18:33:26 INFO clients.Metadata: [Consumer clientId=consumer-hudi_group_080-1, groupId=hudi_group_080] Cluster ID: 5XoPi9AYT0mbHVQEj6VEaw
21/05/26 18:33:27 INFO helpers.KafkaOffsetGen: SourceLimit not configured, set numEvents to default value : 5000000
21/05/26 18:33:27 INFO sources.AvroKafkaSource: About to read 0 from Kafka for topic :xxx
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: No new data, perform empty commit.
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Setting up new Hoodie Write Client
21/05/26 18:33:27 INFO deltastreamer.DeltaSync: Registering Schema :[{"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.ml_xxx.public.foo.Value"}, {"type":"record","name":"Value","namespace":"mlops911.ml_xxx.public.foo","fields":[{"name":"id","type":"int"},{"name":"date","type":["null",{"type":"string","connect.version":1,"connect.name":"io.debezium.time.ZonedTimestamp"}],"default":null},{"name":"text","type":["null","string"],"default":null},{"name":"__null_ts_ms","type":["null","long"],"default":null},{"name":"__deleted","type":["null","string"],"default":null}],"connect.name":"mlops911.m
l_xxx.public.foo.Value"}]
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Starting Timeline service !!
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Overriding hostIp to (xxx) found in spark-conf. It was null
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :EMBEDDED_KV_STORE
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating embedded rocks-db based Table View
21/05/26 18:33:27 INFO util.log: Logging initialized @9978ms to org.apache.hudi.org.eclipse.jetty.util.log.Slf4jLog
21/05/26 18:33:27 INFO javalin.Javalin:
__ __ _
/ /____ _ _ __ ____ _ / /(_)____
__ / // __ `/| | / // __ `// // // __ \
/ /_/ // /_/ / | |/ // /_/ // // // / / /
\____/ \__,_/ |___/ \__,_//_//_//_/ /_/
https://javalin.io/documentation
21/05/26 18:33:27 INFO javalin.Javalin: Starting Javalin ...
21/05/26 18:33:27 INFO javalin.Javalin: Listening on http://localhost:37089/
21/05/26 18:33:27 INFO javalin.Javalin: Javalin started in 179ms \o/
21/05/26 18:33:27 INFO service.TimelineService: Starting Timeline server on port :37089
21/05/26 18:33:27 INFO embedded.EmbeddedTimelineService: Started embedded timeline server at xxx:37089
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO client.AbstractHoodieClient: Timeline Server already running. Not restarting the service
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:27 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:27 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:27 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:27 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants []
21/05/26 18:33:28 INFO client.AbstractHoodieWriteClient: Generate a new instant time: 20210526183328 action: deltacommit
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Creating a new instant [==>20210526183328__deltacommit__REQUESTED]
21/05/26 18:33:28 INFO deltastreamer.DeltaSync: Starting commit : 20210526183328
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:28 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:28 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED]]
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:28 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:28 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:28 INFO client.AsyncCleanerService: Auto cleaning is not enabled. Not running cleaner now
21/05/26 18:33:28 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:114
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 1 (mapToPair at SparkWriteHelper.java:54) as input to shuffle 1
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Registering RDD 5 (countByKey at SparkHoodieBloomIndex.java:114) as input to shuffle 0
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Got job 0 (countByKey at SparkHoodieBloomIndex.java:114) with 2 output partitions
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Final stage: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 1)
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 6.2 KB, free 912.3 MB)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Driver requested a total number of 2 executor(s).
21/05/26 18:33:28 INFO yarn.YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
21/05/26 18:33:28 INFO yarn.YarnAllocator: Submitted 1 unlocalized container requests.
21/05/26 18:33:28 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 2)
21/05/26 18:33:28 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 3.3 KB, free 912.3 MB)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:28 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:28 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[5] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:28 INFO cluster.YarnClusterScheduler: Adding task set 1.0 with 2 tasks
21/05/26 18:33:28 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 0, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:28 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO impl.AMRMClientImpl: Received new token for : xxx:45454
21/05/26 18:33:29 INFO yarn.YarnAllocator: Launching container container_e03_1618828995116_0162_01_000004 on host xxx for executor with ID 2
21/05/26 18:33:29 INFO yarn.YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
21/05/26 18:33:29 INFO impl.ContainerManagementProtocolProxy: Opening proxy : xxx:45454
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_0 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 1.0 (TID 1, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added rdd_3_1 in memory on xxx:35696 (size: 0.0 B, free: 912.3 MB)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 0) in 1023 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 1.0 (TID 1) in 70 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ShuffleMapStage 1 (countByKey at SparkHoodieBloomIndex.java:114) finished in 1.177 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 2)
21/05/26 18:33:29 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114), which has no missing parents
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 3.8 KB, free 912.3 MB)
21/05/26 18:33:29 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.2 KB, free 912.3 MB)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 2 (ShuffledRDD[6] at countByKey at SparkHoodieBloomIndex.java:114) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Adding task set 2.0 with 2 tasks
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 2.0 (TID 2, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:29 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.246.3.9:49980
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 2.0 (TID 3, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 2.0 (TID 2) in 85 ms on xxx (executor 1) (1/2)
21/05/26 18:33:29 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 2.0 (TID 3) in 32 ms on xxx (executor 1) (2/2)
21/05/26 18:33:29 INFO cluster.YarnClusterScheduler: Removed TaskSet 2.0, whose tasks have all completed, from pool
21/05/26 18:33:29 INFO scheduler.DAGScheduler: ResultStage 2 (countByKey at SparkHoodieBloomIndex.java:114) finished in 0.126 s
21/05/26 18:33:29 INFO scheduler.DAGScheduler: Job 0 finished: countByKey at SparkHoodieBloomIndex.java:114, took 1.627903 s
21/05/26 18:33:29 INFO yarn.YarnAllocator: Driver requested a total number of 1 executor(s).
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:78
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 1 (collect at HoodieSparkEngineContext.java:78) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 3 (collect at HoodieSparkEngineContext.java:78)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 368.5 KB, free 911.9 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 101.0 KB, free 911.8 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:38417 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 3 (MapPartitionsRDD[8] at flatMap at HoodieSparkEngineContext.java:78) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 3.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 3.0 (TID 4, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on xxx:35696 (size: 101.0 KB, free: 912.2 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 3.0 (TID 4) in 178 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 3.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 3 (collect at HoodieSparkEngineContext.java:78) finished in 0.233 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 1 finished: collect at HoodieSparkEngineContext.java:78, took 0.236923 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: collect at HoodieSparkEngineContext.java:73
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 2 (collect at HoodieSparkEngineContext.java:73) with 1 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 4 (collect at HoodieSparkEngineContext.java:73)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 368.3 KB, free 911.5 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 100.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:38417 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 4 (MapPartitionsRDD[10] at map at HoodieSparkEngineContext.java:73) (first 15 tasks are for partitions Vector(0))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 4.0 with 1 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 4.0 (TID 5, xxx, executor 1, partition 0, PROCESS_LOCAL, 7710 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on xxx:35696 (size: 100.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 4.0 (TID 5) in 94 ms on xxx (executor 1) (1/1)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 4.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 4 (collect at HoodieSparkEngineContext.java:73) finished in 0.167 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 2 finished: collect at HoodieSparkEngineContext.java:73, took 0.174163 s
21/05/26 18:33:30 INFO spark.SparkContext: Starting job: countByKey at SparkHoodieBloomIndex.java:149
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Registering RDD 14 (countByKey at SparkHoodieBloomIndex.java:149) as input to shuffle 2
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Got job 3 (countByKey at SparkHoodieBloomIndex.java:149) with 2 output partitions
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Final stage: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 6)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4 stored as values in memory (estimated size 7.5 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 3.9 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:38417 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 6 (MapPartitionsRDD[14] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 6.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 6.0 (TID 6, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_4_piece0 in memory on xxx:35696 (size: 3.9 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 6.0 (TID 7, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 6.0 (TID 6) in 60 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 6.0 (TID 7) in 36 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 6.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ShuffleMapStage 6 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.121 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:30 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 7)
21/05/26 18:33:30 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149), which has no missing parents
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5 stored as values in memory (estimated size 3.8 KB, free 911.4 MB)
21/05/26 18:33:30 INFO memory.MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.4 MB)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 7 (ShuffledRDD[15] at countByKey at SparkHoodieBloomIndex.java:149) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Adding task set 7.0 with 2 tasks
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 7.0 (TID 8, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO storage.BlockManagerInfo: Added broadcast_5_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:30 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 2 to 10.246.3.9:49980
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 7.0 (TID 9, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 7.0 (TID 8) in 47 ms on xxx (executor 1) (1/2)
21/05/26 18:33:30 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 7.0 (TID 9) in 20 ms on xxx (executor 1) (2/2)
21/05/26 18:33:30 INFO cluster.YarnClusterScheduler: Removed TaskSet 7.0, whose tasks have all completed, from pool
21/05/26 18:33:30 INFO scheduler.DAGScheduler: ResultStage 7 (countByKey at SparkHoodieBloomIndex.java:149) finished in 0.081 s
21/05/26 18:33:30 INFO scheduler.DAGScheduler: Job 3 finished: countByKey at SparkHoodieBloomIndex.java:149, took 0.219895 s
21/05/26 18:33:30 INFO bloom.SparkHoodieBloomIndex: InputParallelism: ${2}, IndexParallelism: ${0}
21/05/26 18:33:30 INFO bloom.BucketizedBloomCheckPartitioner: TotalBuckets 0, min_buckets/partition 1
21/05/26 18:33:30 INFO rdd.MapPartitionsRDD: Removing RDD 3 from persistence list
21/05/26 18:33:30 INFO storage.BlockManager: Removing RDD 3
21/05/26 18:33:31 INFO rdd.MapPartitionsRDD: Removing RDD 22 from persistence list
21/05/26 18:33:31 INFO storage.BlockManager: Removing RDD 22
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: countByKey at BaseSparkCommitActionExecutor.java:158
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 16 (mapToPair at SparkHoodieBloomIndex.java:266) as input to shuffle 6
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 23 (mapToPair at SparkHoodieBloomIndex.java:287) as input to shuffle 3
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 22 (flatMapToPair at SparkHoodieBloomIndex.java:274) as input to shuffle 4
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Registering RDD 31 (countByKey at BaseSparkCommitActionExecutor.java:158) as input to shuffle 5
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Got job 4 (countByKey at BaseSparkCommitActionExecutor.java:158) with 2 output partitions
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Final stage: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 12)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6 stored as values in memory (estimated size 5.9 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.3 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:38417 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 10 (MapPartitionsRDD[23] at mapToPair at SparkHoodieBloomIndex.java:287) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 10.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 10.0 (TID 10, xxx, executor 1, partition 0, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_6_piece0 in memory on xxx:35696 (size: 3.3 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 1 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 10.0 (TID 11, xxx, executor 1, partition 1, PROCESS_LOCAL, 7640 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 10.0 (TID 10) in 50 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 10.0 (TID 11) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 10.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 10 (mapToPair at SparkHoodieBloomIndex.java:287) finished in 0.092 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ShuffleMapStage 12, ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7 stored as values in memory (estimated size 7.1 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:38417 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 12 (MapPartitionsRDD[31] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 12.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 12.0 (TID 12, xxx, executor 1, partition 0, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_7_piece0 in memory on xxx:35696 (size: 3.8 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 3 to 10.246.3.9:49980
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 4 to 10.246.3.9:49980
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_0 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 12.0 (TID 13, xxx, executor 1, partition 1, PROCESS_LOCAL, 7730 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 12.0 (TID 12) in 105 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added rdd_29_1 in memory on xxx:35696 (size: 0.0 B, free: 912.1 MB)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 12.0 (TID 13) in 24 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 12.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ShuffleMapStage 12 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.146 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: looking for newly runnable stages
21/05/26 18:33:31 INFO scheduler.DAGScheduler: running: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 13)
21/05/26 18:33:31 INFO scheduler.DAGScheduler: failed: Set()
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158), which has no missing parents
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8 stored as values in memory (estimated size 3.8 KB, free 911.3 MB)
21/05/26 18:33:31 INFO memory.MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 2.2 KB, free 911.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:38417 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1184
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 13 (ShuffledRDD[32] at countByKey at BaseSparkCommitActionExecutor.java:158) (first 15 tasks are for partitions Vector(0, 1))
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Adding task set 13.0 with 2 tasks
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 13.0 (TID 14, xxx, executor 1, partition 0, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Added broadcast_8_piece0 in memory on xxx:35696 (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 5 to 10.246.3.9:49980
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 13.0 (TID 15, xxx, executor 1, partition 1, PROCESS_LOCAL, 7651 bytes)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 13.0 (TID 14) in 31 ms on xxx (executor 1) (1/2)
21/05/26 18:33:31 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 13.0 (TID 15) in 12 ms on xxx (executor 1) (2/2)
21/05/26 18:33:31 INFO cluster.YarnClusterScheduler: Removed TaskSet 13.0, whose tasks have all completed, from pool
21/05/26 18:33:31 INFO scheduler.DAGScheduler: ResultStage 13 (countByKey at BaseSparkCommitActionExecutor.java:158) finished in 0.064 s
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 4 finished: countByKey at BaseSparkCommitActionExecutor.java:158, took 0.320123 s
21/05/26 18:33:31 INFO commit.BaseSparkCommitActionExecutor: Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=0, numUpdates=0}, partitionStat={}, operationType=UPSERT}
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.requested
21/05/26 18:33:31 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:31 INFO commit.UpsertPartitioner: AvgRecordSize => 1024
21/05/26 18:33:31 INFO view.AbstractTableFileSystemView: Took 3 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:31 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:31 INFO commit.UpsertPartitioner: Total Buckets :0, buckets info => {},
Partition to insert buckets => {},
UpdateLocations mapped to buckets =>{}
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 175
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 62
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 9
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 148
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 105
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 143
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 55
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 209
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 154
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 147
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 163
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 69
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 34
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 100
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 1
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 193
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 169
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 27
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 16
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 115
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 120
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 106
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 174
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 210
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 96
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 6
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 57
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 133
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 11
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 74
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 107
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 164
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 172
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 176
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 194
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 109
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 37
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 177
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 128
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 182
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 205
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 30
21/05/26 18:33:31 INFO commit.BaseCommitActionExecutor: Auto commit disabled for 20210526183328
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 102
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 180
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 150
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 186
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 89
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 223
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 47
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 158
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 162
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 88
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 39
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 8
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 29
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 124
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 75
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 165
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 217
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 134
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_5_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.1 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 35
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 216
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 22
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 114
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 152
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 42
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 94
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 145
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 126
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 144
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 168
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:38417 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_3_piece0 on xxx:35696 in memory (size: 100.9 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 149
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 38
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 70
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 15
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 118
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 166
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 207
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 170
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 171
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 65
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 5
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 97
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 110
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 222
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 87
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_6_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.2 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 192
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 201
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 117
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 123
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 12
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 60
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 84
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 127
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 91
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 136
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 45
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 200
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 64
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:38417 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_2_piece0 on xxx:35696 in memory (size: 101.0 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 92
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 0
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 81
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 185
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 214
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 21
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 31
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 67
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 112
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 178
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 208
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 78
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 73
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 131
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_8_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 61
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 3
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:38417 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO storage.BlockManagerInfo: Removed broadcast_7_piece0 on xxx:35696 in memory (size: 3.8 KB, free: 912.3 MB)
21/05/26 18:33:31 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:448
21/05/26 18:33:31 INFO scheduler.DAGScheduler: Job 5 finished: sum at DeltaSync.java:448, took 0.000044 s
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 36
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 80
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 103
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 108
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 183
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 72
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 54
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 132
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 99
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 19
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 93
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 179
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 215
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 66
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 77
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 151
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 116
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 191
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 17
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 14
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 18
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 125
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 204
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 146
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 50
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 56
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 52
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 101
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 221
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 213
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 181
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 190
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 85
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned shuffle 2
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 156
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 161
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 53
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 197
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 20
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 41
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 44
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 140
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 218
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 188
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 122
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 195
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 167
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 220
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 43
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 199
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 155
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 24
21/05/26 18:33:31 INFO spark.ContextCleaner: Cleaned accumulator 219
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 71
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 198
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 23
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 135
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 26
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 141
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 121
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 157
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 13
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 130
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned shuffle 0
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 7
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 138
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 63
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 187
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 32
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 196
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 48
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 206
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 119
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 160
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 90
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 40
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 113
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:38417 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_0_piece0 on xxx:35696 in memory (size: 3.3 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 68
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 224
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 28
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 202
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 10
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 139
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 76
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 49
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 137
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 58
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:38417 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_4_piece0 on xxx:35696 in memory (size: 3.9 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 4
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 211
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 212
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 83
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 203
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 33
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 86
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 82
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:38417 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO storage.BlockManagerInfo: Removed broadcast_1_piece0 on xxx:35696 in memory (size: 2.2 KB, free: 912.3 MB)
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 95
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 142
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 111
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 98
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 184
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 46
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 129
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 104
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 159
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 59
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 25
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 173
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 79
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 153
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 189
21/05/26 18:33:32 INFO spark.ContextCleaner: Cleaned accumulator 51
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: sum at DeltaSync.java:449
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 6 finished: sum at DeltaSync.java:449, took 0.000035 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO spark.SparkContext: Starting job: collect at SparkRDDWriteClient.java:120
21/05/26 18:33:32 INFO scheduler.DAGScheduler: Job 7 finished: collect at SparkRDDWriteClient.java:120, took 0.000039 s
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO util.CommitUtils: Creating metadata for UPSERT numWriteStats:0numReplaceFileIds:0
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__INFLIGHT]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Committing 20210526183328 action deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Marking instant complete [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Checking for file exists ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit.inflight
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Create new file for toInstant ?/user/hd_xyz/yyy/ml_xxx/foo/.hoodie/20210526183328.deltacommit
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Completed [==>20210526183328__deltacommit__INFLIGHT]
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[==>20210526183328__deltacommit__REQUESTED], [==>20210526183328__deltacommit__INFLIGHT], [20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO table.HoodieTimelineArchiveLog: No Instants to archive
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Auto cleaning is enabled. Running cleaner now
21/05/26 18:33:32 INFO client.AbstractHoodieWriteClient: Scheduling cleaning at instant time :20210526183332
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:32 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating remote view for basePath /user/hd_xyz/yyy/ml_xxx/foo. Server=xxx:37089, Timeout=300
21/05/26 18:33:32 INFO view.FileSystemViewManager: Creating InMemory based view for basePath /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:32 INFO view.AbstractTableFileSystemView: Took 0 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:32 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:32 INFO view.RemoteHoodieTableFileSystemView: Sending request : (http://xxx:37089/v1/hoodie/view/compactions/pending/?basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1)
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO collection.RocksDBDAO: DELETING RocksDB persisted at /tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11
21/05/26 18:33:33 INFO collection.RocksDBDAO: No column family found. Loading default
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:230] Creating manifest 1
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3406] Recovering from manifest file: MANIFEST-000001
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [default]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3610] Recovered from manifest file:/tmp/hoodie_timeline_rocksdb/_user_hdfs_xyz_public_ml_xxx_foo/a138e066-6b6b-4f72-8865-4c30301cbe11/MANIFEST-000001 succeeded,manifest_file_number is 1, next_file_number is 3, last_sequence is 0, log_number is 0,prev_log_number is 0,max_column_family is 0,min_log_number_to_keep is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:3618] Column family [default] (ID 0), log number is 0
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl_open.cc:1287] DB pointer 0x7f3aaccf1f20
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/version_set.cc:2936] Creating manifest 6
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_view__user_hdfs_xyz_public_ml_xxx_foo] (ID 1)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo] (ID 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_bootstrap_basefile__user_hdfs_xyz_public_ml_xxx_foo] (ID 3)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_partitions__user_hdfs_xyz_public_ml_xxx_foo] (ID 4)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 5)
21/05/26 18:33:33 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.246.4.117:53684) with ID 2
21/05/26 18:33:33 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 2)
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/column_family.cc:475] --------------- Options for column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo]:
21/05/26 18:33:33 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:1546] Created column family [hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo] (ID 6)
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_replaced_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.AbstractTableFileSystemView: Took 9 ms to read 0 instants, 0 replaced file groups
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing pending compaction operations. Count=0
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Initializing external data file mapping. Count=0
21/05/26 18:33:33 INFO util.ClusteringUtils: Found 0 files in pending clustering operations
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting file groups in pending clustering to ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb, Total file-groups=0
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix DELETE (query=part=) on hudi_pending_clustering_fg_user_hdfs_xyz_public_ml_xxx_foo
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Resetting replacedFileGroups to ROCKSDB based file-system view complete
21/05/26 18:33:33 INFO view.RocksDbBasedFileSystemView: Created ROCKSDB based file-system view at /tmp/hoodie_timeline_rocksdb
21/05/26 18:33:33 INFO collection.RocksDBDAO: Prefix Search for (query=) on hudi_pending_compaction__user_hdfs_xyz_public_ml_xxx_foo. Total Time Taken (msec)=1. Serialization Time taken(micro)=0, num entries=0
21/05/26 18:33:33 INFO service.RequestHandler: TimeTakenMillis[Total=791, Refresh=779, handle=11, Check=1], Success=true, Query=basepath=%2Fuser%2Fhdfs%2Fxyz%2Fpublic%2Fml_xxx%2Ffoo&lastinstantts=20210526183328&timelinehash=3cb19d4eacc8a39b3d4198ed17d5dac7ca1a076cc50020fab31fed29c6ccddb1, Host=xxx:37089, synced=false
21/05/26 18:33:33 INFO storage.BlockManagerMasterEndpoint: Registering block manager xxx:36920 with 912.3 MB RAM, BlockManagerId(2, xxx, 36920, None)
21/05/26 18:33:33 INFO clean.CleanPlanner: No earliest commit to retain. No need to scan partitions !!
21/05/26 18:33:33 INFO clean.CleanPlanner: Nothing to clean here. It is already clean
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaner started
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Cleaned failed attempts if any
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO client.SparkRDDWriteClient: Successfully synced to metadata table
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Committed 20210526183328
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling table service COMPACT
21/05/26 18:33:33 INFO client.AbstractHoodieWriteClient: Scheduling compaction at instant time :20210526183333
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading HoodieTableMetaClient from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO fs.FSUtils: Hadoop Configuration: fs.defaultFS: [hdfs://xxx:8020], Config:[Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, __spark_hadoop_conf__.xml], FileSystem: [DFS[DFSClient[clientName=DFSClient_NONMAPREDUCE_-862249120_1, ugi=hdfs (auth:SIMPLE)]]]
21/05/26 18:33:33 INFO table.HoodieTableConfig: Loading table properties from /user/hd_xyz/yyy/ml_xxx/foo/.hoodie/hoodie.properties
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Finished Loading Table of type MERGE_ON_READ(version=1, baseFileFormat=PARQUET) from /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO table.HoodieTableMetaClient: Loading Active commit timeline for /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO timeline.HoodieActiveTimeline: Loaded instants [[20210526183328__deltacommit__COMPLETED]]
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating View Manager with storage type :REMOTE_FIRST
21/05/26 18:33:33 INFO view.FileSystemViewManager: Creating remote first table view
21/05/26 18:33:33 INFO compact.SparkScheduleCompactionActionExecutor: Checking if compaction needs to be run on /user/hd_xyz/yyy/ml_xxx/foo
21/05/26 18:33:33 INFO deltastreamer.DeltaSync: Commit 20210526183328 successful!
21/05/26 18:33:33 INFO rdd.MapPartitionsRDD: Removing RDD 29 from persistence list
21/05/26 18:33:33 INFO storage.BlockManager: Removing RDD 29
21/05/26 18:33:34 INFO rdd.MapPartitionsRDD: Removing RDD 37 from persistence list
21/05/26 18:33:34 INFO storage.BlockManager: Removing RDD 37
21/05/26 18:33:34 INFO deltastreamer.DeltaSync: Shutting down embedded timeline server
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closing Timeline server
21/05/26 18:33:34 INFO service.TimelineService: Closing Timeline Service
21/05/26 18:33:34 INFO javalin.Javalin: Stopping Javalin ...
21/05/26 18:33:34 INFO javalin.Javalin: Javalin has stopped
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closing Rocksdb !!
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:365] Shutdown: canceling all background work
21/05/26 18:33:34 INFO collection.RocksDBDAO: From Rocks DB : [db/db_impl.cc:521] Shutdown complete
21/05/26 18:33:34 INFO view.RocksDbBasedFileSystemView: Closed Rocksdb !!
21/05/26 18:33:34 INFO service.TimelineService: Closed Timeline Service
21/05/26 18:33:34 INFO embedded.EmbeddedTimelineService: Closed Timeline server
21/05/26 18:33:34 INFO deltastreamer.HoodieDeltaStreamer: Shut down delta streamer
21/05/26 18:33:34 INFO server.AbstractConnector: Stopped Spark@7a0e94b4{HTTP/1.1,[http/1.1]}{0.0.0.0:0}
21/05/26 18:33:34 INFO ui.SparkUI: Stopped Spark web UI at http://xxx:32822
21/05/26 18:33:34 INFO yarn.YarnAllocator: Driver requested a total number of 0 executor(s).
21/05/26 18:33:34 INFO cluster.YarnClusterSchedulerBackend: Shutting down all executors
21/05/26 18:33:34 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down
21/05/26 18:33:34 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices
(serviceOption=None,
services=List(),
started=false)
21/05/26 18:33:34 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/05/26 18:33:34 INFO memory.MemoryStore: MemoryStore cleared
21/05/26 18:33:34 INFO storage.BlockManager: BlockManager stopped
21/05/26 18:33:34 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/05/26 18:33:34 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/05/26 18:33:34 INFO spark.SparkContext: Successfully stopped SparkContext
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Final app status: SUCCEEDED, exitCode: 0
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with SUCCEEDED
21/05/26 18:33:34 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered.
21/05/26 18:33:34 INFO yarn.ApplicationMaster: Deleting staging directory hdfs://xxx:8020/user/hd_xyz/.sparkStaging/application_1618828995116_0162
21/05/26 18:33:34 INFO util.ShutdownHookManager: Shutdown hook called
21/05/26 18:33:34 INFO util.ShutdownHookManager: Deleting directory /data/hadoop/yarn/local/usercache/hdfs/appcache/application_1618828995116_0162/spark-4c7e81b9-e526-4325-abf0-d163828b92b5
`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] vinothchandar closed issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
vinothchandar closed issue #2959:
URL: https://github.com/apache/hudi/issues/2959
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] PavelPetukhov edited a comment on issue #2959: No data stored after migrating to Hudi 0.8.0
Posted by GitBox <gi...@apache.org>.
PavelPetukhov edited a comment on issue #2959:
URL: https://github.com/apache/hudi/issues/2959#issuecomment-848930327
@n3nash
This is our full log:
[spark_log.txt](https://github.com/apache/hudi/files/6548394/spark_log.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org