You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/06/30 03:55:17 UTC
[GitHub] [hudi] zhouxumeng213 opened a new issue, #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi
zhouxumeng213 opened a new issue, #6009:
URL: https://github.com/apache/hudi/issues/6009
[2022-06-30 10:27:37,174] WARN Error while syncing (org.apache.hadoop.hdfs.DFSClient:694)
org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File does not exist: /tmp/huditest/partition_0/.hoodie_partition_metadata_0 (inode 16466) [Lease. Holder: DFSClient_NONMAPREDUCE_-535797949_164, pending creates: 1]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2898)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.fsync(FSNamesystem.java:3370)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.fsync(NameNodeRpcServer.java:1468)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.fsync(ClientNamenodeProtocolServerSideTranslatorPB.java:1056)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
at org.apache.hadoop.ipc.Client.call(Client.java:1495)
at org.apache.hadoop.ipc.Client.call(Client.java:1394)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy50.fsync(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.fsync(ClientNamenodeProtocolTranslatorPB.java:861)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy51.fsync(Unknown Source)
at org.apache.hadoop.hdfs.DFSOutputStream.flushOrSync(DFSOutputStream.java:668)
at org.apache.hadoop.hdfs.DFSOutputStream.hsync(DFSOutputStream.java:536)
at org.apache.hadoop.fs.FSDataOutputStream.hsync(FSDataOutputStream.java:147)
at org.apache.hadoop.fs.FSDataOutputStream.hsync(FSDataOutputStream.java:147)
at org.apache.hudi.common.model.HoodiePartitionMetadata.trySave(HoodiePartitionMetadata.java:96)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:99)
at org.apache.hudi.io.HoodieCreateHandle.<init>(HoodieCreateHandle.java:74)
at org.apache.hudi.io.CreateHandleFactory.create(CreateHandleFactory.java:46)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:83)
at org.apache.hudi.execution.CopyOnWriteInsertHandler.consumeOneRecord(CopyOnWriteInsertHandler.java:40)
at org.apache.hudi.common.util.queue.BoundedInMemoryQueueConsumer.consume(BoundedInMemoryQueueConsumer.java:37)
at org.apache.hudi.common.util.queue.BoundedInMemoryExecutor.lambda$null$2(BoundedInMemoryExecutor.java:121)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
[2022-06-30 10:27:37,185] WARN Caught exception (org.apache.hadoop.hdfs.DataStreamer:982)
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1252)
at java.lang.Thread.join(Thread.java:1326)
at org.apache.hadoop.hdfs.DataStreamer.closeResponder(DataStreamer.java:980)
at org.apache.hadoop.hdfs.DataStreamer.closeInternal(DataStreamer.java:844)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:840)
[2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,426] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:37,427] INFO Got brand-new compressor [.gz] (org.apache.hadoop.io.compress.CodecPool:153)
[2022-06-30 10:27:38,588] WARN DataStreamer Exception (org.apache.hadoop.hdfs.DataStreamer:823)
java.io.FileNotFoundException: File does not exist: /tmp/huditest/partition_1/.hoodie_partition_metadata_0 (inode 16491) [Lease. Holder: DFSClient_NONMAPREDUCE_-535797949_164, pending creates: 1]
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2898)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.analyzeFileState(FSDirWriteFileOp.java:599)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2777)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:892)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:574)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:528)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1070)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:999)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:927)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2915)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] zhouxumeng213 commented on issue #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi
Posted by GitBox <gi...@apache.org>.
zhouxumeng213 commented on issue #6009:
URL: https://github.com/apache/hudi/issues/6009#issuecomment-1171841056
### Environment Description
Hudi version : 0.10.0
Spark version : 2.4.5
Hadoop version : 3.2.1
Storage (HDFS/S3/GCS..) : HDFS
Running on Docker? (yes/no) : no
### steps
1、Zookeeper and Kafka have been started;
2、Run setupkafka. sh under hudi-master/hudi-kafka-connect/demo to create the number:sh setupKafka.sh -n 100000
3、Go to the kafka home directory and run:./bin/connect-distributed.sh /opt/hudi-kafka-connect/demo/connect-distributed.properties
4、Initiate a CONNECT request and execute:curl -X POST -H "Content-Type:application/json" -d @/opt/config-sink-test.json http://master:8084/connectors
### note:
1、connect-distributed.properties configuration:
bootstrap.servers=51.38.135.107:9092
group.id=hudi-connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.topic=connect-offsets
offset.storage.replication.factor=1
config.storage.topic=connect-configs
config.storage.replication.factor=1
status.storage.topic=connect-status
status.storage.replication.factor=1
offset.flush.interval.ms=60000
listeners=HTTP://:8084
plugin.path=/usr/local/share/kafka/plugins
2、config-sink-test.json configuration:
{
"name": "hudi-test-topic",
"config": {
"bootstrap.servers": "51.38.135.107:9092",
"connector.class": "org.apache.hudi.connect.HoodieSinkConnector",
"tasks.max": "12",
"key.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter": "org.apache.kafka.connect.storage.StringConverter",
"value.converter.schemas.enable": "false",
"topics": "hudi-test-topic15",
"hoodie.table.name": "test_hudi_table",
"hoodie.table.type": "COPY_ON_WRITE",
"hoodie.base.path": "hdfs://tdh0623hudi.storage.huawei.com/tmp/huditest",
"hoodie.datasource.write.partitionpath.field": "date",
"hoodie.datasource.write.recordkey.field": "volume",
"hoodie.schemaprovider.class": "org.apache.hudi.schema.FilebasedSchemaProvider",
"hoodie.deltastreamer.schemaprovider.source.schema.file": "hdfs://tdh0623hudi.storage.huawei.com/tmp/schema.avsc",
"hoodie.clean.automatic": false,
"hoodie.kafka.commit.interval.secs": 60
}
}
### Refer to the link:
https://github.com/apache/hudi/tree/master/hudi-kafka-connect
### Phenomenon of the problem:
After a connect request is made, data will be written to HDFS, but will be deleted after a while. Kafka Connect logs are as follows,Small amount of data such as (20,000, 50,000, etc., no problem, 100,000 will have this problem)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [hudi] yihua commented on issue #6009: kafka connect to Hudi,When 10w pieces of data are inserted, data is successfully written to kafka and deleted after kafka is dropped to hudi
Posted by GitBox <gi...@apache.org>.
yihua commented on issue #6009:
URL: https://github.com/apache/hudi/issues/6009#issuecomment-1171776555
Hi @zhouxumeng213 , could you provide the detailed setup (environment, versions, etc.) and the steps to reproduce the errors?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org