You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/30 03:55:17 UTC

[GitHub] [hudi] zhouxumeng213 opened a new issue, #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi

zhouxumeng213 opened a new issue, #6009:
URL: https://github.com/apache/hudi/issues/6009

   [2022-06-30 10:27:37,174] WARN Error while syncing (org.apache.hadoop.hdfs.DFSClient:694)
   org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /tmp/huditest/partition_0/.hoodie_partition_metadata_0 (inode 16466) [Lease.  Holder: DFSClient_NONMAPREDUCE_-535797949_164, pending creates: 1]
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2898)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.fsync(FSNamesystem.java:3370)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.fsync(NameNodeRpcServer.java:1468)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.fsync(ClientNamenodeProtocolServerSideTranslatorPB.java:1056)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
   
   	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
   	at org.apache.hadoop.ipc.Client.call(Client.java:1495)
   	at org.apache.hadoop.ipc.Client.call(Client.java:1394)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
   	at com.sun.proxy.$Proxy50.fsync(Unknown Source)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.fsync(ClientNamenodeProtocolTranslatorPB.java:861)
   	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
   	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   	at java.lang.reflect.Method.invoke(Method.java:498)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
   	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
   	at com.sun.proxy.$Proxy51.fsync(Unknown Source)
   	at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:668)
   	at org.apache.hadoop.hdfs.DFSOutputStream.hsync(DFSOutputStream.java:536)
   	at org.apache.hadoop.fs.FSDataOutputStream.hsync(FSDataOutputStream.java:147)
   	at org.apache.hadoop.fs.FSDataOutputStream.hsync(FSDataOutputStream.java:147)
   	at org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:96)
   	at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:99)
   	at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:74)
   	at org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:46)
   	at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:83)
   	at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
   	at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
   	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
   	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
   	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
   	at java.lang.Thread.run(Thread.java:748)
   [2022-06-30 10:27:37,185] WARN Caught exception (org.apache.hadoop.hdfs.DataStreamer:982)
   java.lang.InterruptedException
   	at java.lang.Object.wait(Native Method)
   	at java.lang.Thread.join(Thread.java:1252)
   	at java.lang.Thread.join(Thread.java:1326)
   	at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
   	at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:844)
   	at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840)
   [2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
   [2022-06-30 10:27:38,588] WARN DataStreamer Exception (org.apache.hadoop.hdfs.DataStreamer:823)
   java.io.FileNotFoundException: File does not exist: /tmp/huditest/partition_1/.hoodie_partition_metadata_0 (inode 16491) [Lease.  Holder: DFSClient_NONMAPREDUCE_-535797949_164, pending creates: 1]
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2898)
   	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
   	at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
   	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2777)
   	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
   	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
   	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
   	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
   	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
   	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
   	at java.security.AccessController.doPrivileged(Native Method)
   	at javax.security.auth.Subject.doAs(Subject.java:422)
   	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
   	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] zhouxumeng213 commented on issue #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi

Posted by GitBox <gi...@apache.org>.

zhouxumeng213 commented on issue #6009:
URL: https://github.com/apache/hudi/issues/6009#issuecomment-1171841056

   ### Environment Description
   
   Hudi version : 0.10.0
   
   Spark version : 2.4.5
   
   Hadoop version : 3.2.1
   
   Storage (HDFS/S3/GCS..) : HDFS
   
   Running on Docker? (yes/no) : no
   
   ### steps
   1、Zookeeper and Kafka have been started；
   2、Run setupkafka. sh under hudi-master/hudi-kafka-connect/demo to create the number：sh setupKafka.sh -n 100000
   3、Go to the kafka home directory and run：./bin/connect-distributed.sh /opt/hudi-kafka-connect/demo/connect-distributed.properties
   4、Initiate a CONNECT request and execute：curl -X POST -H "Content-Type:application/json" -d @/opt/config-sink-test.json http://master:8084/connectors
   
   ### note：
   1、connect-distributed.properties  configuration：
   
   bootstrap.servers=51.38.135.107:9092
   group.id=hudi-connect-cluster
   key.converter=org.apache.kafka.connect.json.JsonConverter
   value.converter=org.apache.kafka.connect.json.JsonConverter
   key.converter.schemas.enable=true
   value.converter.schemas.enable=true
   offset.storage.topic=connect-offsets
   offset.storage.replication.factor=1
   config.storage.topic=connect-configs
   config.storage.replication.factor=1
   status.storage.topic=connect-status
   status.storage.replication.factor=1
   
   offset.flush.interval.ms=60000
   listeners=HTTP://:8084
   plugin.path=/usr/local/share/kafka/plugins
   
   2、config-sink-test.json  configuration：
   
   {
       "name": "hudi-test-topic",
       "config": {
           "bootstrap.servers": "51.38.135.107:9092",
           "connector.class": "org.apache.hudi.connect.HoodieSinkConnector",
           "tasks.max": "12",
           "key.converter": "org.apache.kafka.connect.storage.StringConverter",
           "value.converter": "org.apache.kafka.connect.storage.StringConverter",
           "value.converter.schemas.enable": "false",
           "topics": "hudi-test-topic15",
           "hoodie.table.name": "test_hudi_table",
           "hoodie.table.type": "COPY_ON_WRITE",
           "hoodie.base.path": "hdfs://tdh0623hudi.storage.huawei.com/tmp/huditest",
           "hoodie.datasource.write.partitionpath.field": "date",
           "hoodie.datasource.write.recordkey.field": "volume",
           "hoodie.schemaprovider.class": "org.apache.hudi.schema.FilebasedSchemaProvider",
           "hoodie.deltastreamer.schemaprovider.source.schema.file": "hdfs://tdh0623hudi.storage.huawei.com/tmp/schema.avsc",
           "hoodie.clean.automatic": false,
           "hoodie.kafka.commit.interval.secs": 60
       }
   }
   
   ### Refer to the link：
   https://github.com/apache/hudi/tree/master/hudi-kafka-connect
   
   ### Phenomenon of the problem:
   After a connect request is made, data will be written to HDFS, but will be deleted after a while. Kafka Connect logs are as follows，Small amount of data such as (20,000, 50,000, etc., no problem, 100,000 will have this problem)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] yihua commented on issue #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi

Posted by GitBox <gi...@apache.org>.

yihua commented on issue #6009:
URL: https://github.com/apache/hudi/issues/6009#issuecomment-1171776555

   Hi @zhouxumeng213 , could you provide the detailed setup (environment, versions, etc.) and the steps to reproduce the errors?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org