You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Ethan Guo (Jira)" <ji...@apache.org> on 2022/01/01 08:00:00 UTC

[jira] [Updated] (HUDI-3104) Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR

     [ https://issues.apache.org/jira/browse/HUDI-3104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ethan Guo updated HUDI-3104:
----------------------------
        Parent: HUDI-2324
    Issue Type: Sub-task  (was: Bug)

> Hudi-kafka-connect can not scan hadoop config files by HADOOP_CONF_DIR
> ----------------------------------------------------------------------
>
>                 Key: HUDI-3104
>                 URL: https://issues.apache.org/jira/browse/HUDI-3104
>             Project: Apache Hudi
>          Issue Type: Sub-task
>          Components: configs
>            Reporter: cdmikechen
>            Assignee: cdmikechen
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.11.0, 0.10.1
>
>
> I used hudi-kafka-connect to test pull kafka topic datas to hudi. I've build a kafka connect docker by this dockerfile:
> {code}
> FROM confluentinc/cp-kafka-connect:6.1.1
> RUN confluent-hub install --no-prompt confluentinc/kafka-connect-hdfs:10.1.3
> COPY hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar /usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib
> {code}
> When I started this docker container and submit a task, hudi report this error:
> {code}
> [2021-12-27 15:04:55,214] INFO Setting record key volume and partition fields date for table hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topichudi-test-topic (org.apache.hudi.connect.writers.KafkaConnectTransactionServices)
> [2021-12-27 15:04:55,224] INFO Initializing hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic as hoodie table hdfs://hdp-syzh-cluster/hive/warehouse/default.db/hudi-test-topic (org.apache.hudi.common.table.HoodieTableMetaClient)
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/usr/share/confluent-hub-components/confluentinc-kafka-connect-hdfs/lib/hadoop-auth-2.10.1.jar) to method sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> [2021-12-27 15:04:55,571] WARN Unable to load native-hadoop library for your platform... using builtin-java classes where applicable (org.apache.hadoop.util.NativeCodeLoader)
> [2021-12-27 15:04:56,154] ERROR Fatal error initializing task null for partition 0 (org.apache.hudi.connect.HoodieSinkTask)
> org.apache.hudi.exception.HoodieException: Fatal error instantiating Hudi Transaction Services 
> 	at org.apache.hudi.connect.writers.KafkaConnectTransactionServices.<init>(KafkaConnectTransactionServices.java:113) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.connect.transaction.ConnectTransactionCoordinator.<init>(ConnectTransactionCoordinator.java:88) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.connect.HoodieSinkTask.bootstrap(HoodieSinkTask.java:191) [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.connect.HoodieSinkTask.open(HoodieSinkTask.java:151) [hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.openPartitions(WorkerSinkTask.java:640) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.access$1100(WorkerSinkTask.java:71) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask$HandleRebalance.onPartitionsAssigned(WorkerSinkTask.java:705) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.invokePartitionsAssigned(ConsumerCoordinator.java:293) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.onJoinComplete(ConsumerCoordinator.java:430) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.joinGroupIfNeeded(AbstractCoordinator.java:449) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:365) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.poll(ConsumerCoordinator.java:508) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.KafkaConsumer.updateAssignmentMetadataIfNeeded(KafkaConsumer.java:1257) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1226) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1206) [kafka-clients-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.pollConsumer(WorkerSinkTask.java:457) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:324) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189) [connect-runtime-6.1.1-ccs.jar:?]
> 	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:238) [connect-runtime-6.1.1-ccs.jar:?]
> 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
> 	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
> 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) [?:?]
> 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) [?:?]
> 	at java.lang.Thread.run(Thread.java:829) [?:?]
> Caused by: java.lang.IllegalArgumentException: java.net.UnknownHostException: hdp-syzh-cluster
> 	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:443) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:142) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:369) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:303) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:159) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3247) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3296) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3264) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:102) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.common.table.HoodieTableMetaClient.initTableAndGetMetaClient(HoodieTableMetaClient.java:350) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.common.table.HoodieTableMetaClient$PropertyBuilder.initTable(HoodieTableMetaClient.java:897) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.connect.writers.KafkaConnectTransactionServices.<init>(KafkaConnectTransactionServices.java:109) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	... 25 more
> Caused by: java.net.UnknownHostException: hdp-syzh-cluster
> 	at org.apache.hadoop.security.SecurityUtil.buildTokenService(SecurityUtil.java:443) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.NameNodeProxiesClient.createProxyWithClientProtocol(NameNodeProxiesClient.java:142) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:369) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:303) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:159) ~[hadoop-hdfs-client-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3247) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:121) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3296) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3264) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:475) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hadoop.fs.Path.getFileSystem(Path.java:356) ~[hadoop-common-2.10.1.jar:?]
> 	at org.apache.hudi.common.fs.FSUtils.getFs(FSUtils.java:102) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.common.table.HoodieTableMetaClient.initTableAndGetMetaClient(HoodieTableMetaClient.java:350) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.common.table.HoodieTableMetaClient$PropertyBuilder.initTable(HoodieTableMetaClient.java:897) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	at org.apache.hudi.connect.writers.KafkaConnectTransactionServices.<init>(KafkaConnectTransactionServices.java:109) ~[hudi-kafka-connect-bundle-0.11.0-SNAPSHOT.jar:0.11.0-SNAPSHOT]
> 	... 25 more
> [2021-12-27 15:05:51,434] ERROR WorkerSinkTask{id=hudi-sink-0} RetriableException from SinkTask: (org.apache.kafka.connect.runtime.WorkerSinkTask)
> org.apache.kafka.connect.errors.RetriableException: TransactionParticipant should be created for each assigned partition, but has not been created for the topic/partition: hudi-test-topic:0
> 	at org.apache.hudi.connect.HoodieSinkTask.put(HoodieSinkTask.java:111)
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:586)
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:329)
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:232)
> 	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
> 	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:189)
> 	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:238)
> 	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> 	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> 	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> 	at java.base/java.lang.Thread.run(Thread.java:829)
> {code}
> I set *HADOOP_CONF_DIR*  and it didn't work.
> So I checked codes and found that hudi use the default initialization *Configuration* method when initializing HDFS configuration. If I specify *HADOOP_CONF_DIR*, it doesn't work.
> Because Kafka does not have a default HDFS environment, we need the ability to specify *HADOOP_HOME* or *HADOOP_CONF_DIR* or in kafka-connect 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)