You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/08/12 11:47:25 UTC

[GitHub] [hudi] tooptoop4 opened a new issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

tooptoop4 opened a new issue #1954:
URL: https://github.com/apache/hudi/issues/1954


   i'm loading data from DMS and i don't want any partitions (i did not specify hoodie.datasource.hive_sync.partition_fields since website says can leave default empty)
   
   ```
   /home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk4 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.MultiPartKeysValueExtractor --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk4 --tar
 get-table dmstest_multpk4 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf hoodie.datasource.write.partitionpath.field=sys_user --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tblhere > multpk4.log
   ```
   
   ```
   2020-08-12 11:31:11,186 [main] INFO  org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812112840
   2020-08-12 11:31:11,189 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812112840 successful!
   2020-08-12 11:31:11,194 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table with hive table(dmstest_multpk4). Hive metastore URL :jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk4
   2020-08-12 11:31:11,960 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812112840__commit__COMPLETED]]
   2020-08-12 11:31:14,264 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table dmstest_multpk4 with base path s3a://redact/my2/multpk4 of type COPY_ON_WRITE
   2020-08-12 11:31:14,707 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Reading schema from s3a://redact/my2/multpk4/mpark2/7ed7627c-6110-4d42-9df2-f3a6afe877df-0_187-25-15737_20200812112840.parquet
   2020-08-12 11:31:15,330 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Hive table dmstest_multpk4 is not found. Creating it
   2020-08-12 11:31:15,337 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT
  SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
   2020-08-12 11:31:15,411 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 74 ms
   2020-08-12 11:31:15,444 [main] INFO  hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apa
 che.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4'
   2020-08-12 11:31:16,131 [main] INFO  hive.ql.parse.ParseDriver - Parse Completed
   2020-08-12 11:31:16,568 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk4`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FOR
 MAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk4']: 1157 ms
   2020-08-12 11:31:16,574 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for dmstest_multpk4
   2020-08-12 11:31:16,574 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null
   2020-08-12 11:31:16,575 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Last commit time synced is not known, listing all partitions in s3a://redact/my2/multpk4,FS :S3AFileSystem{uri=s3a://redact, workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='AES256', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405, available=2405, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], statistics {761530 bytes read, 320081 bytes written, 712 read ops, 0 large read ops, 31 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=db54a51b-e05e-4b3c-9140-240762a0c03d-redact
 } {fsURI=s3a://redact/redact/sparkevents} {files_created=5} {files_copied=0} {files_copied_bytes=0} {files_deleted=271} {fake_directories_deleted=0} {directories_created=6} {directories_deleted=0} {ignored_errors=4} {op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=415} {op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=271} {op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} {object_copy_requests=0} {object_delete_requests=5} {object_list_requests=680} {object_continue_list_requests=0} {object_metadata_requests=805} {object_multipart_aborted=0} {object_put_bytes=320081} {object_put_requests=10} {object_put_requests_completed=10} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=320081} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_
 write_block_uploads_pending=4} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22} {stream_bytes_backwards_on_seek=437965} {stream_bytes_read=761530} {stream_read_operations_incomplete=107} {stream_bytes_discarded_in_abort=0} {stream_close_operations=22} {stream_read_operations=3020} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=1} {stream_seek_operations=1} {stream_bytes_read_in_close=8} {stream_read_exceptions=0} }}
   2020-08-12 11:31:34,438 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 271
   2020-08-12 11:31:34,476 [main] INFO  org.apache.hudi.hive.HiveSyncTool - New Partitions [AAB, redactlist]
   2020-08-12 11:31:34,476 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Adding partitions 271 to table dmstest_multpk4
   2020-08-12 11:31:34,477 [main] ERROR org.apache.hudi.hive.HiveSyncTool - Got runtime exception when hive syncing
   org.apache.hudi.hive.HoodieHiveSyncException: Failed to sync partitions for table dmstest_multpk4
           at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:187)
           at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:126)
           at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:87)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncHive(DeltaSync.java:460)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.writeToSink(DeltaSync.java:402)
           at org.apache.hudi.utilities.deltastreamer.DeltaSync.syncOnce(DeltaSync.java:235)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.sync(HoodieDeltaStreamer.java:123)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values [AAB]. Check partition strategy.
           at org.apache.hudi.common.util.ValidationUtils.checkArgument(ValidationUtils.java:40)
           at org.apache.hudi.hive.HoodieHiveClient.getPartitionClause(HoodieHiveClient.java:182)
           at org.apache.hudi.hive.HoodieHiveClient.constructAddPartitions(HoodieHiveClient.java:166)
           at org.apache.hudi.hive.HoodieHiveClient.addPartitionsToTable(HoodieHiveClient.java:141)
           at org.apache.hudi.hive.HiveSyncTool.syncPartitions(HiveSyncTool.java:182)
           ... 19 more
   2020-08-12 11:31:34,513 [main] INFO  org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down deltastreamer
   2020-08-12 11:31:34,535 [main] INFO  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down all executors
   ```
   ```
   aws s3 ls s3://redact/my2/multpk4/
                              PRE .hoodie/
                              PRE AAB/
                              PRE CC/
                              PRE DD/
                              ...etc
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar closed issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
bvaradar closed issue #1954:
URL: https://github.com/apache/hudi/issues/1954


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-685782917


   @tooptoop4 : @satishkotha  has merged the fix for complexKeyGenerator. Did you get a chance to try it ?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] satishkotha commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
satishkotha commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-680262762


   If a single column as key works for you, you can also try
   
   hoodie.datasource.write.keygenerator.class=com.uber.hoodie.NonpartitionedKeyGenerator
   hoodie.datasource.hive_sync.partition_extractor_class=com.uber.hoodie.hive.NonPartitionedExtractor
   hoodie.datasource.write.recordkey.field=(new column that is unique)
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] satishkotha commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
satishkotha commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-680259178


   @tooptoop4  
   For non-partitioned tables, data is typically stored in base directory (s3://redact/my2/multpk7/). Looks like partitionpath field you specified is getting interpreted incorrectly, so the data is being stored under 'default' partition. You also specified 'NonPartitionedExtractor' for hive sync. So 'default' partition is not registered with hive.
   
    ComplexKeyGenerator doesn't seem to work well with non-partitioned tables. I tried making it work by making this code change https://github.com/apache/hudi/pull/2037. Can you apply this patch and let me know if it works?


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672843556


   got a bit further with the below, now hudi/spark job succeeds but the hive ddl is pointing at wrong s3 location, so doing select from hive/presto gives error. But when i manually alter the s3 location in the table ddl via hiveserver2 then it works (ie change LOCATION 's3a://redact/my2/multpk7' to LOCATION 's3a://redact/my2/multpk7/default'), so i think there should be some code change to make it create table at proper s3 location.
   
   ```
   /home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk7 --target-
 table dmstest_multpk7 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl > multpk7.log
   OK
   ```
   
   cat multpk7.log
   ```
   2020-08-12 12:18:15,375 [main] WARN  org.apache.hudi.utilities.deltastreamer.SchedulerConfGenerator - Job Scheduling Configs will not be in effect as spark.scheduler.mode is not set to FAIR at instantiation time. Continuing without scheduling configs
   2020-08-12 12:18:16,386 [dispatcher-event-loop-3] INFO  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Connected to Spark cluster with app ID app-20200812121816-0086
   2020-08-12 12:18:17,199 [main] INFO  com.amazonaws.http.AmazonHttpClient - Configuring Proxy. redact
   2020-08-12 12:18:18,154 [main] INFO  org.apache.spark.scheduler.EventLoggingListener - Logging events to s3a://redact/sparkevents/app-20200812121816-0086
   2020-08-12 12:18:18,171 [dispatcher-event-loop-2] INFO  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Granted executor ID app-20200812121816-0086/0 on hostPort redact:19629 with 4 core(s), 7.9 GB RAM
   2020-08-12 12:18:18,195 [main] INFO  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
   2020-08-12 12:18:18,427 [main] WARN  org.apache.spark.SparkContext - Using an existing SparkContext; some configuration may not take effect.
   2020-08-12 12:18:18,526 [main] ERROR org.apache.hudi.common.util.DFSPropertiesConfiguration - Error reading in properies from dfs
   java.io.FileNotFoundException: File file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties does not exist
           at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
           at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
           at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
           at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
           at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
           at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
           at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
           at org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
           at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
           at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
           at org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   2020-08-12 12:18:18,528 [main] WARN  org.apache.hudi.utilities.UtilHelpers - Unexpected error read props file at :file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties
   java.lang.IllegalArgumentException: Cannot read properties from dfs
           at org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:91)
           at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:60)
           at org.apache.hudi.common.util.DFSPropertiesConfiguration.<init>(DFSPropertiesConfiguration.java:64)
           at org.apache.hudi.utilities.UtilHelpers.readConfig(UtilHelpers.java:118)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer$DeltaSyncService.<init>(HoodieDeltaStreamer.java:451)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:97)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.<init>(HoodieDeltaStreamer.java:91)
           at org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer.main(HoodieDeltaStreamer.java:380)
           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(Method.java:498)
           at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
           at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:845)
           at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:161)
           at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:184)
           at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:86)
           at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:920)
           at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:929)
           at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
   Caused by: java.io.FileNotFoundException: File file:/home/ec2-user/http_listener/logs/src/test/resources/delta-streamer-config/dfs-source.properties does not exist
           at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:635)
           at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:861)
           at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:625)
           at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:442)
           at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
           at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
           at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:787)
           at org.apache.hudi.common.util.DFSPropertiesConfiguration.visitFile(DFSPropertiesConfiguration.java:87)
           ... 19 more
   2020-08-12 12:18:18,528 [main] INFO  org.apache.hudi.utilities.UtilHelpers - Adding overridden properties to file properties.
   2020-08-12 12:18:18,529 [main] INFO  org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Creating delta streamer with configs : {hoodie.datasource.hive_sync.use_jdbc=false, hoodie.datasource.write.recordkey.field=version_no,group_company, hoodie.datasource.write.partitionpath.field=, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor, hoodie.datasource.hive_sync.table=dmstest_multpk7, hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl, hoodie.datasource.hive_sync.database=redact}
   2020-08-12 12:18:18,533 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Creating delta streamer with configs : {hoodie.datasource.hive_sync.use_jdbc=false, hoodie.datasource.write.recordkey.field=version_no,group_company, hoodie.datasource.write.partitionpath.field=, hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator, hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor, hoodie.datasource.hive_sync.table=dmstest_multpk7, hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl, hoodie.datasource.hive_sync.database=redact}
   2020-08-12 12:18:19,798 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write Client
   2020-08-12 12:18:19,799 [main] INFO  org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Delta Streamer running only single round
   2020-08-12 12:18:20,218 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:20,222 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Checkpoint to resume from : Option{val=null}
   2020-08-12 12:18:42,136 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Setting up Hoodie Write Client
   2020-08-12 12:18:42,156 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Registering Schema :[{"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHistoryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_b
 reakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","null"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]}, {"type":"record","name":"hoodie_source","namespace":"hoodie.source","fields":[{"name":"Op","type":["string","null"]},{"name":"Id","type":["int","null"]},{"name":"AuditProcessHis
 toryId","type":["int","null"]},{"name":"org_id","type":["int","null"]},{"name":"org_name","type":["string","null"]},{"name":"org_sname","type":["string","null"]},{"name":"org_mnem","type":["string","null"]},{"name":"org_parent","type":["int","null"]},{"name":"percent_holding","type":["double","null"]},{"name":"group_company","type":["string","null"]},{"name":"grp_ord_for_cln","type":["string","null"]},{"name":"mkt_only","type":["string","null"]},{"name":"pro_rate_ind","type":["string","null"]},{"name":"show_shapes","type":["string","null"]},{"name":"sec_code_pref","type":["string","null"]},{"name":"alert_org_ref","type":["string","null"]},{"name":"swift_bic","type":["string","null"]},{"name":"exec_breakdown","type":["string","null"]},{"name":"notes","type":["string","null"]},{"name":"active","type":["string","null"]},{"name":"version_no","type":["int","null"]},{"name":"sys_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"sys_user","type":["string","nu
 ll"]},{"name":"create_date","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"cntry_of_dom","type":["string","null"]},{"name":"client","type":["string","null"]},{"name":"alert_acronym","type":["string","null"]},{"name":"oneoff_client","type":["string","null"]},{"name":"booking_domicile","type":["string","null"]},{"name":"booking_dom_list","type":["string","null"]},{"name":"TimeCreated","type":[{"type":"long","logicalType":"timestamp-micros"},"null"]},{"name":"UserCreated","type":["string","null"]}]}]
   2020-08-12 12:18:50,361 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:50,934 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:50,937 [main] INFO  org.apache.hudi.client.HoodieWriteClient - Generate a new instant time 20200812121850
   2020-08-12 12:18:51,226 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants []
   2020-08-12 12:18:51,234 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Creating a new instant [==>20200812121850__commit__REQUESTED]
   2020-08-12 12:18:51,415 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Starting commit  : 20200812121850
   2020-08-12 12:18:51,699 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__REQUESTED]]
   2020-08-12 12:18:51,982 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__REQUESTED]]
   2020-08-12 12:19:21,501 [main] INFO  org.apache.hudi.index.bloom.HoodieBloomIndex - InputParallelism: ${1500}, IndexParallelism: ${0}
   2020-08-12 12:19:32,817 [main] INFO  org.apache.hudi.client.HoodieWriteClient - Workload profile :WorkloadProfile {globalStat=WorkloadStat {numInserts=103, numUpdates=0}, partitionStat={default=WorkloadStat {numInserts=103, numUpdates=0}}}
   2020-08-12 12:19:32,841 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit.requested
   2020-08-12 12:19:33,081 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
   2020-08-12 12:19:33,082 [main] INFO  org.apache.hudi.table.HoodieCopyOnWriteTable - AvgRecordSize => 1024
   2020-08-12 12:19:33,184 [main] INFO  org.apache.hudi.table.HoodieCopyOnWriteTable - For partitionPath : default Small Files => []
   2020-08-12 12:19:33,184 [main] INFO  org.apache.hudi.table.HoodieCopyOnWriteTable - After small file assignment: unassignedInserts => 103, totalInsertBuckets => 1, recordsPerBucket => 122880
   2020-08-12 12:19:33,185 [main] INFO  org.apache.hudi.table.HoodieCopyOnWriteTable - Total insert buckets for partition path default => [WorkloadStat {bucketNumber=0, weight=1.0}]
   2020-08-12 12:19:33,186 [main] INFO  org.apache.hudi.table.HoodieCopyOnWriteTable - Total Buckets :1, buckets info => {0=BucketInfo {bucketType=INSERT, fileIdPrefix=a9ab6f7a-4def-490a-aac0-49e15ee9d742}},
   Partition to insert buckets => {default=[WorkloadStat {bucketNumber=0, weight=1.0}]},
   UpdateLocations mapped to buckets =>{}
   2020-08-12 12:19:33,206 [main] INFO  org.apache.hudi.client.AbstractHoodieWriteClient - Auto commit disabled for 20200812121850
   2020-08-12 12:19:41,179 [main] INFO  org.apache.hudi.client.AbstractHoodieWriteClient - Commiting 20200812121850
   2020-08-12 12:19:41,502 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:41,777 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:42,140 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:42,479 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__INFLIGHT]]
   2020-08-12 12:19:42,706 [main] INFO  org.apache.hudi.table.HoodieTable - Removing marker directory=s3a://redact/my2/multpk7/.hoodie/.temp/20200812121850
   2020-08-12 12:19:43,027 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Marking instant complete [==>20200812121850__commit__INFLIGHT]
   2020-08-12 12:19:43,027 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Checking for file exists ?s3a://redact/my2/multpk7/.hoodie/20200812121850.inflight
   2020-08-12 12:19:43,356 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Create new file for toInstant ?s3a://redact/my2/multpk7/.hoodie/20200812121850.commit
   2020-08-12 12:19:43,357 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Completed [==>20200812121850__commit__INFLIGHT]
   2020-08-12 12:19:43,745 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,010 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,084 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[==>20200812121850__commit__REQUESTED], [==>20200812121850__commit__INFLIGHT], [20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,085 [main] INFO  org.apache.hudi.table.HoodieCommitArchiveLog - No Instants to archive
   2020-08-12 12:19:44,086 [main] INFO  org.apache.hudi.client.HoodieWriteClient - Auto cleaning is enabled. Running cleaner now
   2020-08-12 12:19:44,356 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,629 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:44,912 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:45,321 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:45,337 [main] INFO  org.apache.hudi.table.CleanHelper - No earliest commit to retain. No need to scan partitions !!
   2020-08-12 12:19:45,337 [main] INFO  org.apache.hudi.table.HoodieCopyOnWriteTable - Nothing to clean here. It is already clean
   2020-08-12 12:19:45,374 [main] INFO  org.apache.hudi.client.AbstractHoodieWriteClient - Committed 20200812121850
   2020-08-12 12:19:45,374 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Commit 20200812121850 successful!
   2020-08-12 12:19:45,375 [main] INFO  org.apache.hudi.utilities.deltastreamer.DeltaSync - Syncing target hoodie table with hive table(dmstest_multpk7). Hive metastore URL :jdbc:hive2://localhost:10000, basePath :s3a://redact/my2/multpk7
   2020-08-12 12:19:45,636 [main] INFO  org.apache.hudi.common.table.timeline.HoodieActiveTimeline - Loaded instants [[20200812121850__commit__COMPLETED]]
   2020-08-12 12:19:46,806 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Trying to sync hoodie table dmstest_multpk7 with base path s3a://redact/my2/multpk7 of type COPY_ON_WRITE
   2020-08-12 12:19:46,864 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Reading schema from s3a://redact/my2/multpk7/default/a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
   2020-08-12 12:19:47,064 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Hive table dmstest_multpk7 is not found. Creating it
   2020-08-12 12:19:47,070 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Creating table with CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT
  SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk7'
   2020-08-12 12:19:47,151 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Time taken to start SessionState and create Driver: 81 ms
   2020-08-12 12:19:47,186 [main] INFO  hive.ql.parse.ParseDriver - Parsing command: CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FORMAT SERDE 'org.apa
 che.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk7'
   2020-08-12 12:19:47,874 [main] INFO  hive.ql.parse.ParseDriver - Parse Completed
   2020-08-12 12:19:48,323 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Time taken to execute [CREATE EXTERNAL TABLE  IF NOT EXISTS `redact`.`dmstest_multpk7`( `_hoodie_commit_time` string, `_hoodie_commit_seqno` string, `_hoodie_record_key` string, `_hoodie_partition_path` string, `_hoodie_file_name` string, `Op` string, `Id` int, `AuditProcessHistoryId` int, `org_id` int, `org_name` string, `org_sname` string, `org_mnem` string, `org_parent` int, `percent_holding` double, `group_company` string, `grp_ord_for_cln` string, `mkt_only` string, `pro_rate_ind` string, `show_shapes` string, `sec_code_pref` string, `alert_org_ref` string, `swift_bic` string, `exec_breakdown` string, `notes` string, `active` string, `version_no` int, `sys_date` bigint, `sys_user` string, `create_date` bigint, `cntry_of_dom` string, `client` string, `alert_acronym` string, `oneoff_client` string, `booking_domicile` string, `booking_dom_list` string, `TimeCreated` bigint, `UserCreated` string) ROW FOR
 MAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hudi.hadoop.HoodieParquetInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat' LOCATION 's3a://redact/my2/multpk7']: 1171 ms
   2020-08-12 12:19:48,329 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Schema sync complete. Syncing partitions for dmstest_multpk7
   2020-08-12 12:19:48,329 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Last commit time synced was found to be null
   2020-08-12 12:19:48,330 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - Last commit time synced is not known, listing all partitions in s3a://redact/my2/multpk7,FS :S3AFileSystem{uri=s3a://redact, workingDir=s3a://redact/user/ec2-user, inputPolicy=normal, partSize=104857600, enableMultiObjectsDelete=true, maxKeys=5000, readAhead=65536, blockSize=33554432, multiPartThreshold=2147483647, serverSideEncryptionAlgorithm='AES256', blockFactory=org.apache.hadoop.fs.s3a.S3ADataBlocks$DiskBlockFactory@62765aec, boundedExecutor=BlockingThreadPoolExecutorService{SemaphoredDelegatingExecutor{permitCount=2405, available=2405, waiting=0}, activeCount=0}, unboundedExecutor=java.util.concurrent.ThreadPoolExecutor@6f5bd362[Running, pool size = 6, active threads = 0, queued tasks = 0, completed tasks = 6], statistics {445890 bytes read, 4324 bytes written, 172 read ops, 0 large read ops, 31 write ops}, metrics {{Context=S3AFileSystem} {FileSystemId=aad8f6ce-2b40-4ddb-9b9b-4e82033cb193-redact} 
 {fsURI=s3a://redact/sparkevents} {files_created=5} {files_copied=0} {files_copied_bytes=0} {files_deleted=1} {fake_directories_deleted=0} {directories_created=6} {directories_deleted=0} {ignored_errors=4} {op_copy_from_local_file=0} {op_exists=53} {op_get_file_status=145} {op_glob_status=0} {op_is_directory=38} {op_is_file=0} {op_list_files=1} {op_list_located_status=0} {op_list_status=19} {op_mkdirs=5} {op_rename=0} {object_copy_requests=0} {object_delete_requests=5} {object_list_requests=140} {object_continue_list_requests=0} {object_metadata_requests=265} {object_multipart_aborted=0} {object_put_bytes=4324} {object_put_requests=10} {object_put_requests_completed=10} {stream_write_failures=0} {stream_write_block_uploads=0} {stream_write_block_uploads_committed=0} {stream_write_block_uploads_aborted=0} {stream_write_total_time=0} {stream_write_total_data=4324} {object_put_requests_active=0} {object_put_bytes_pending=0} {stream_write_block_uploads_active=0} {stream_write_block_uploa
 ds_pending=4} {stream_write_block_uploads_data_pending=0} {stream_read_fully_operations=0} {stream_opened=22} {stream_bytes_skipped_on_seek=0} {stream_closed=22} {stream_bytes_backwards_on_seek=438082} {stream_bytes_read=445890} {stream_read_operations_incomplete=71} {stream_bytes_discarded_in_abort=0} {stream_close_operations=22} {stream_read_operations=2764} {stream_aborted=0} {stream_forward_seek_operations=0} {stream_backward_seek_operations=1} {stream_seek_operations=1} {stream_bytes_read_in_close=8} {stream_read_exceptions=0} }}
   2020-08-12 12:19:48,584 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Storage partitions scan complete. Found 1
   2020-08-12 12:19:48,613 [main] INFO  org.apache.hudi.hive.HiveSyncTool - New Partitions []
   2020-08-12 12:19:48,614 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - No partitions to add for dmstest_multpk7
   2020-08-12 12:19:48,614 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Changed Partitions []
   2020-08-12 12:19:48,614 [main] INFO  org.apache.hudi.hive.HoodieHiveClient - No partitions to change for dmstest_multpk7
   2020-08-12 12:19:49,002 [main] INFO  org.apache.hudi.hive.HiveSyncTool - Sync complete for dmstest_multpk7
   2020-08-12 12:19:49,031 [main] INFO  org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer - Shut down deltastreamer
   2020-08-12 12:19:49,044 [main] INFO  org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend - Shutting down all executors
   ```
   
   ```
   aws s3 ls s3://redact/my2/multpk7/
                              PRE .hoodie/
                              PRE default/
                              
   aws s3 ls s3://redact/my2/multpk7/default/
   2020-08-12 12:19:39         93 .hoodie_partition_metadata
   2020-08-12 12:19:41     452644 a9ab6f7a-4def-490a-aac0-49e15ee9d742-0_0-25-15010_20200812121850.parquet
   ```
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-672831272


   even with NonPartitionedExtractor getting same issue


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-680168375


   @satishkotha : Would you be able to help reproduce this  ? 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-691745743


   Closing this issue. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] bvaradar commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
bvaradar commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-679305798


   @tooptoop4 : IIUC, Are you effectively changing a table from non-partitioned to partitioned ? The exception you added to the last comment was about a missing file which does not tie up with your comments. Can you elaborate on the steps to reproduce this. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [hudi] tooptoop4 commented on issue #1954: [SUPPORT] DMS Caused by: java.lang.IllegalArgumentException: Partition key parts [] does not match with partition values

Posted by GitBox <gi...@apache.org>.
tooptoop4 commented on issue #1954:
URL: https://github.com/apache/hudi/issues/1954#issuecomment-679353671


   @bvaradar in each comment I am trying brand new tables with different spark submits. So not changing an existing table.
   
   try to reproduce with
   
   /home/ec2-user/spark_home/bin/spark-submit --conf "spark.hadoop.fs.s3a.proxy.host=redact" --conf "spark.hadoop.fs.s3a.proxy.port=redact" --conf "spark.driver.extraClassPath=/home/ec2-user/json-20090211.jar" --conf "spark.executor.extraClassPath=/home/ec2-user/json-20090211.jar" --class org.apache.hudi.utilities.deltastreamer.HoodieDeltaStreamer --jars "/home/ec2-user/spark-avro_2.11-2.4.6.jar" --master spark://redact:7077 --deploy-mode client /home/ec2-user/hudi-utilities-bundle_2.11-0.5.3-1.jar --table-type COPY_ON_WRITE --source-ordering-field TimeCreated --source-class org.apache.hudi.utilities.sources.ParquetDFSSource --enable-hive-sync --hoodie-conf hoodie.datasource.hive_sync.database=redact --hoodie-conf hoodie.datasource.hive_sync.table=dmstest_multpk7 --hoodie-conf hoodie.datasource.hive_sync.partition_extractor_class=org.apache.hudi.hive.NonPartitionedExtractor --hoodie-conf  hoodie.datasource.hive_sync.use_jdbc=false --target-base-path s3a://redact/my2/multpk7 --target-
 table dmstest_multpk7 --transformer-class org.apache.hudi.utilities.transform.AWSDmsTransformer --payload-class org.apache.hudi.payload.AWSDmsAvroPayload --hoodie-conf hoodie.datasource.write.keygenerator.class=org.apache.hudi.keygen.ComplexKeyGenerator --hoodie-conf hoodie.datasource.write.recordkey.field=version_no,group_company --hoodie-conf "hoodie.datasource.write.partitionpath.field=" --hoodie-conf hoodie.deltastreamer.source.dfs.root=s3a://redact/dbo/tbl
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org