You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/12 11:47:25 UTC
[GitHub] [hudi] tooptoop4 opened a new issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
tooptoop4 opened a new issue #1954:
URL: https://github.com/apache/hudi/issues/1954
i'm loading data from DMS and i don't want any partitions (i did not specify hoodie.datasource.hive_sync.partition_fields since website says can leave default empty)
```
/home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk4 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk4 --tar
get-table dmstest_multpk4 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf hoodie.datasource.write.partitionpath.field=sys_user --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tblhere > multpk4.log
```
```
2020-08-12 11:31:11,186 [main] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812112840
2020-08-12 11:31:11,189 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812112840 successful!
2020-08-12 11:31:11,194 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table with hive table(dmstest_multpk4). Hive metastore URL :jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk4
2020-08-12 11:31:11,960 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812112840__commit__COMPLETED]]
2020-08-12 11:31:14,264 [main] INFO org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table dmstest_multpk4 with base path s3a://redact/my2/multpk4 of type COPY_ON_WRITE
2020-08-12 11:31:14,707 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Reading schema from s3a://redact/my2/multpk4/mpark2/7ed7627c-6110-4d42-9df2-f3a6afe877df-0_187-25-15737_20200812112840.parquet
2020-08-12 11:31:15,330 [main] INFO org.apache.hudi.hive.HiveSyncTool - Hive table dmstest_multpk4 is not found. Creating it
2020-08-12 11:31:15,337 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT
SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
2020-08-12 11:31:15,411 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 74 ms
2020-08-12 11:31:15,444 [main] INFO hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apa
che.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
2020-08-12 11:31:16,131 [main] INFO hive.ql.parse.ParseDriver - Parse Completed
2020-08-12 11:31:16,568 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FOR
MAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4']: 1157 ms
2020-08-12 11:31:16,574 [main] INFO org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for dmstest_multpk4
2020-08-12 11:31:16,574 [main] INFO org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null
2020-08-12 11:31:16,575 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Last commit time synced is not known, listing all partitions in s3a://redact/my2/multpk4,FS :S3AFileSystem{uri=s3a://redact, workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='AES256', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405, available=2405, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], statistics {761530 bytes read, 320081 bytes written, 712 read ops, 0 large read ops, 31 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=db54a51b-e05e-4b3c-9140-240762a0c03d-redact
} {fsURI=s3a://redact/redact/sparkevents} {files_created=5} {files_copied=0} {files_copied_bytes=0} {files_deleted=271} {fake_directories_deleted=0} {directories_created=6} {directories_deleted=0} {ignored_errors=4} {op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=415} {op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=271} {op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} {object_copy_requests=0} {object_delete_requests=5} {object_list_requests=680} {object_continue_list_requests=0} {object_metadata_requests=805} {object_multipart_aborted=0} {object_put_bytes=320081} {object_put_requests=10} {object_put_requests_completed=10} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=320081} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_
write_block_uploads_pending=4} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22} {stream_bytes_backwards_on_seek=437965} {stream_bytes_read=761530} {stream_read_operations_incomplete=107} {stream_bytes_discarded_in_abort=0} {stream_close_operations=22} {stream_read_operations=3020} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=1} {stream_seek_operations=1} {stream_bytes_read_in_close=8} {stream_read_exceptions=0} }}
2020-08-12 11:31:34,438 [main] INFO org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 271
2020-08-12 11:31:34,476 [main] INFO org.apache.hudi.hive.HiveSyncTool - New Partitions [AAB, redactlist]
2020-08-12 11:31:34,476 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Adding partitions 271 to table dmstest_multpk4
2020-08-12 11:31:34,477 [main] ERROR org.apache.hudi.hive.HiveSyncTool - Got runtime exception when hive syncing
org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table dmstest_multpk4
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:460)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:402)
at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:235)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values [AAB]. Check partition strategy.
at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
at org.apache.hudi.hive.HoodieHiveClient.getPartitionClause(HoodieHiveClient.java:182)
at org.apache.hudi.hive.HoodieHiveClient.constructAddPartitions(HoodieHiveClient.java:166)
at org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:141)
at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:182)
... 19 more
2020-08-12 11:31:34,513 [main] INFO org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down deltastreamer
2020-08-12 11:31:34,535 [main] INFO org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down all executors
```
```
aws s3 ls s3://redact/my2/multpk4/
PRE .hoodie/
PRE AAB/
PRE CC/
PRE DD/
...etc
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar closed issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1954:
URL: https://github.com/apache/hudi/issues/1954
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-685782917
@tooptoop4 : @satishkotha has merged the fix for complexKeyGenerator. Did you get a chance to try it ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] satishkotha commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
satishkotha commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-680262762
If a single column as key works for you, you can also try
hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGenerator
hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor
hoodie.datasource.write.recordkey.field=(new column that is unique)
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] satishkotha commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
satishkotha commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-680259178
@tooptoop4
For non-partitioned tables, data is typically stored in base directory (s3://redact/my2/multpk7/). Looks like partitionpath field you specified is getting interpreted incorrectly, so the data is being stored under 'default' partition. You also specified 'NonPartitionedExtractor' for hive sync. So 'default' partition is not registered with hive.
ComplexKeyGenerator doesn't seem to work well with non-partitioned tables. I tried making it work by making this code change https://github.com/apache/hudi/pull/2037. Can you apply this patch and let me know if it works?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672843556
got a bit further with the below, now hudi/spark job succeeds but the hive ddl is pointing at wrong s3 location, so doing select from hive/presto gives error. But when i manually alter the s3 location in the table ddl via hiveserver2 then it works (ie change LOCATION 's3a://redact/my2/multpk7' to LOCATION 's3a://redact/my2/multpk7/default'), so i think there should be some code change to make it create table at proper s3 location.
```
/home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk7 --target-
table dmstest_multpk7 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl > multpk7.log
OK
```
cat multpk7.log
```
2020-08-12 12:18:15,375 [main] WARN org.apache.hudi.utilities.deltastreamer.SchedulerConfGenerator - Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
2020-08-12 12:18:16,386 [dispatcher-event-loop-3] INFO org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Connected to Spark cluster with app ID app-20200812121816-0086
2020-08-12 12:18:17,199 [main] INFO com.amazonaws.http.AmazonHttpClient - Configuring Proxy. redact
2020-08-12 12:18:18,154 [main] INFO org.apache.spark.scheduler.EventLoggingListener - Logging events to s3a://redact/sparkevents/app-20200812121816-0086
2020-08-12 12:18:18,171 [dispatcher-event-loop-2] INFO org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Granted executor ID app-20200812121816-0086/0 on hostPort redact:19629 with 4 core(s), 7.9 GB RAM
2020-08-12 12:18:18,195 [main] INFO org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
2020-08-12 12:18:18,427 [main] WARN org.apache.spark.SparkContext - Using an existing SparkContext; some configuration may not take effect.
2020-08-12 12:18:18,526 [main] ERROR org.apache.hudi.common.util.DFSPropertiesConfiguration - Error reading in properies from dfs
java.io.FileNotFoundException: File file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
at org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
2020-08-12 12:18:18,528 [main] WARN org.apache.hudi.utilities.UtilHelpers - Unexpected error read props file at :file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
java.lang.IllegalArgumentException: Cannot read properties from dfs
at org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:91)
at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
at org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.FileNotFoundException: File file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties does not exist
at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
at org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
... 19 more
2020-08-12 12:18:18,528 [main] INFO org.apache.hudi.utilities.UtilHelpers - Adding overridden properties to file properties.
2020-08-12 12:18:18,529 [main] INFO org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Creating delta streamer with configs : {hoodie.datasource.hive_sync.use_jdbc=false, hoodie.datasource.write.recordkey.field=version_no,group_company, hoodie.datasource.write.partitionpath.field=, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor, hoodie.datasource.hive_sync.table=dmstest_multpk7, hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl, hoodie.datasource.hive_sync.database=redact}
2020-08-12 12:18:18,533 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Creating delta streamer with configs : {hoodie.datasource.hive_sync.use_jdbc=false, hoodie.datasource.write.recordkey.field=version_no,group_company, hoodie.datasource.write.partitionpath.field=, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor, hoodie.datasource.hive_sync.table=dmstest_multpk7, hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl, hoodie.datasource.hive_sync.database=redact}
2020-08-12 12:18:19,798 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write Client
2020-08-12 12:18:19,799 [main] INFO org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Delta Streamer running only single round
2020-08-12 12:18:20,218 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:20,222 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Checkpoint to resume from : Option{val=null}
2020-08-12 12:18:42,136 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write Client
2020-08-12 12:18:42,156 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Registering Schema :[{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHistoryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_b
reakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","null"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]}, {"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHis
toryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_breakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","nu
ll"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]}]
2020-08-12 12:18:50,361 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:50,934 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:50,937 [main] INFO org.apache.hudi.client.HoodieWriteClient - Generate a new instant time 20200812121850
2020-08-12 12:18:51,226 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
2020-08-12 12:18:51,234 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Creating a new instant [==>20200812121850__commit__REQUESTED]
2020-08-12 12:18:51,415 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Starting commit : 20200812121850
2020-08-12 12:18:51,699 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__REQUESTED]]
2020-08-12 12:18:51,982 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__REQUESTED]]
2020-08-12 12:19:21,501 [main] INFO org.apache.hudi.index.bloom.HoodieBloomIndex - InputParallelism: ${1500}, IndexParallelism: ${0}
2020-08-12 12:19:32,817 [main] INFO org.apache.hudi.client.HoodieWriteClient - Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=103, numUpdates=0}, partitionStat={default=WorkloadStat {numInserts=103, numUpdates=0}}}
2020-08-12 12:19:32,841 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit.requested
2020-08-12 12:19:33,081 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
2020-08-12 12:19:33,082 [main] INFO org.apache.hudi.table.HoodieCopyOnWriteTable - AvgRecordSize => 1024
2020-08-12 12:19:33,184 [main] INFO org.apache.hudi.table.HoodieCopyOnWriteTable - For partitionPath : default Small Files => []
2020-08-12 12:19:33,184 [main] INFO org.apache.hudi.table.HoodieCopyOnWriteTable - After small file assignment: unassignedInserts => 103, totalInsertBuckets => 1, recordsPerBucket => 122880
2020-08-12 12:19:33,185 [main] INFO org.apache.hudi.table.HoodieCopyOnWriteTable - Total insert buckets for partition path default => [WorkloadStat {bucketNumber=0, weight=1.0}]
2020-08-12 12:19:33,186 [main] INFO org.apache.hudi.table.HoodieCopyOnWriteTable - Total Buckets :1, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=a9ab6f7a-4def-490a-aac0-49e15ee9d742}},
Partition to insert buckets => {default=[WorkloadStat {bucketNumber=0, weight=1.0}]},
UpdateLocations mapped to buckets =>{}
2020-08-12 12:19:33,206 [main] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Auto commit disabled for 20200812121850
2020-08-12 12:19:41,179 [main] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Commiting 20200812121850
2020-08-12 12:19:41,502 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:41,777 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:42,140 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:42,479 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
2020-08-12 12:19:42,706 [main] INFO org.apache.hudi.table.HoodieTable - Removing marker directory=s3a://redact/my2/multpk7/.hoodie/.temp/20200812121850
2020-08-12 12:19:43,027 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Marking instant complete [==>20200812121850__commit__INFLIGHT]
2020-08-12 12:19:43,027 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
2020-08-12 12:19:43,356 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit
2020-08-12 12:19:43,357 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Completed [==>20200812121850__commit__INFLIGHT]
2020-08-12 12:19:43,745 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,010 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,084 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__REQUESTED], [==>20200812121850__commit__INFLIGHT], [20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,085 [main] INFO org.apache.hudi.table.HoodieCommitArchiveLog - No Instants to archive
2020-08-12 12:19:44,086 [main] INFO org.apache.hudi.client.HoodieWriteClient - Auto cleaning is enabled. Running cleaner now
2020-08-12 12:19:44,356 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,629 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:44,912 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:45,321 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:45,337 [main] INFO org.apache.hudi.table.CleanHelper - No earliest commit to retain. No need to scan partitions !!
2020-08-12 12:19:45,337 [main] INFO org.apache.hudi.table.HoodieCopyOnWriteTable - Nothing to clean here. It is already clean
2020-08-12 12:19:45,374 [main] INFO org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812121850
2020-08-12 12:19:45,374 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812121850 successful!
2020-08-12 12:19:45,375 [main] INFO org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table with hive table(dmstest_multpk7). Hive metastore URL :jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk7
2020-08-12 12:19:45,636 [main] INFO org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
2020-08-12 12:19:46,806 [main] INFO org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table dmstest_multpk7 with base path s3a://redact/my2/multpk7 of type COPY_ON_WRITE
2020-08-12 12:19:46,864 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Reading schema from s3a://redact/my2/multpk7/default/a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
2020-08-12 12:19:47,064 [main] INFO org.apache.hudi.hive.HiveSyncTool - Hive table dmstest_multpk7 is not found. Creating it
2020-08-12 12:19:47,070 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT
SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk7'
2020-08-12 12:19:47,151 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 81 ms
2020-08-12 12:19:47,186 [main] INFO hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apa
che.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk7'
2020-08-12 12:19:47,874 [main] INFO hive.ql.parse.ParseDriver - Parse Completed
2020-08-12 12:19:48,323 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE IF NOT EXISTS `redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FOR
MAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk7']: 1171 ms
2020-08-12 12:19:48,329 [main] INFO org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for dmstest_multpk7
2020-08-12 12:19:48,329 [main] INFO org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null
2020-08-12 12:19:48,330 [main] INFO org.apache.hudi.hive.HoodieHiveClient - Last commit time synced is not known, listing all partitions in s3a://redact/my2/multpk7,FS :S3AFileSystem{uri=s3a://redact, workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='AES256', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405, available=2405, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], statistics {445890 bytes read, 4324 bytes written, 172 read ops, 0 large read ops, 31 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=aad8f6ce-2b40-4ddb-9b9b-4e82033cb193-redact}
{fsURI=s3a://redact/sparkevents} {files_created=5} {files_copied=0} {files_copied_bytes=0} {files_deleted=1} {fake_directories_deleted=0} {directories_created=6} {directories_deleted=0} {ignored_errors=4} {op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=145} {op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=1} {op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} {object_copy_requests=0} {object_delete_requests=5} {object_list_requests=140} {object_continue_list_requests=0} {object_metadata_requests=265} {object_multipart_aborted=0} {object_put_bytes=4324} {object_put_requests=10} {object_put_requests_completed=10} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=4324} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_write_block_uploa
ds_pending=4} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22} {stream_bytes_backwards_on_seek=438082} {stream_bytes_read=445890} {stream_read_operations_incomplete=71} {stream_bytes_discarded_in_abort=0} {stream_close_operations=22} {stream_read_operations=2764} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=1} {stream_seek_operations=1} {stream_bytes_read_in_close=8} {stream_read_exceptions=0} }}
2020-08-12 12:19:48,584 [main] INFO org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 1
2020-08-12 12:19:48,613 [main] INFO org.apache.hudi.hive.HiveSyncTool - New Partitions []
2020-08-12 12:19:48,614 [main] INFO org.apache.hudi.hive.HoodieHiveClient - No partitions to add for dmstest_multpk7
2020-08-12 12:19:48,614 [main] INFO org.apache.hudi.hive.HiveSyncTool - Changed Partitions []
2020-08-12 12:19:48,614 [main] INFO org.apache.hudi.hive.HoodieHiveClient - No partitions to change for dmstest_multpk7
2020-08-12 12:19:49,002 [main] INFO org.apache.hudi.hive.HiveSyncTool - Sync complete for dmstest_multpk7
2020-08-12 12:19:49,031 [main] INFO org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down deltastreamer
2020-08-12 12:19:49,044 [main] INFO org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down all executors
```
```
aws s3 ls s3://redact/my2/multpk7/
PRE .hoodie/
PRE default/
aws s3 ls s3://redact/my2/multpk7/default/
2020-08-12 12:19:39 93 .hoodie_partition_metadata
2020-08-12 12:19:41 452644 a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672831272
even with NonPartitionedExtractor getting same issue
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-680168375
@satishkotha : Would you be able to help reproduce this ?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-691745743
Closing this issue.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-679305798
@tooptoop4 : IIUC, Are you effectively changing a table from non-partitioned to partitioned ? The exception you added to the last comment was about a missing file which does not tie up with your comments. Can you elaborate on the steps to reproduce this.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values
Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-679353671
@bvaradar in each comment I am trying brand new tables with different spark submits. So not changing an existing table.
try to reproduce with
/home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor --hoodie-conf hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk7 --target-
table dmstest_multpk7 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
users@infra.apache.org