You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@kafka.apache.org by chinchu chinchu <ch...@gmail.com> on 2019/01/23 21:57:24 UTC
Kafka Hdfs Connect Flush Size
Hey folks,
I have been going through the hdfs connector code . I have a one question.
Is the flush size in connector config the number of records read from a
kafka partition or the number of records written to an hdfs path?.
Looks like the recordCounter in TopicPartitionWriter is incremented for
every record received from a kafka partition.
In this case how does this connector handles records from the same kafka
partition but going on to two different hdfs paths if flush size is at
hdfs file level.
After looking through the code and running TopicPartitionWriter test cases
I think that the flush size is the number of records written to hdfs
from a kafka partition . I ran the test case
testWriteRecordFieldPartitioner() in TopicPartitionWriterTest and saw the
same. Can some one clarify if my understanding is right ?
https://docs.confluent.io/current/connect/kafka-connect-hdfs/configuration_options.html
flush.size
Number of records written to store before invoking file commits.
Type: int
Importance: high
Thanks,
Chinchu