You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by "Randall Hauch (JIRA)" <ji...@apache.org> on 2018/05/29 18:18:00 UTC
[jira] [Resolved] (KAFKA-6831) FileStreamSink is very slow

     [ https://issues.apache.org/jira/browse/KAFKA-6831?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Randall Hauch resolved KAFKA-6831.
----------------------------------
       Resolution: Invalid
    Fix Version/s:     (was: 1.1.0)

[~vrmprabhat], first of all, these kinds of questions are better asked through the user mailing list or other online resources. But to get you started, Kafka Connect uses normal producers and consumers under the covers, so be sure that you set the consumer settings in the Connect worker configuration to handle traffic like yours. It's often a balancing act of batch and message sizes to handle your throughputs. A few commonly used consumer properties include {{fetch.min.bytes}}, {{max.partition.fetch.bytes}}, {{fetch.max.bytes}}, and {{fetch.message.max.bytes}}; these might be too small given the message size, # of partitions, throughput, etc., and might be instructing Connect to consume too many small batches.


> FileStreamSink is very slow
> ---------------------------
>
>                 Key: KAFKA-6831
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6831
>             Project: Kafka
>          Issue Type: Test
>          Components: consumer
>    Affects Versions: 1.1.0
>            Reporter: Prabhat Verma
>            Priority: Major
>
> Hi Team,
>  
> I am very new in kafka. My project requirement is fetch data from source location and place it in other other location (consumer location). I am using FileStreamSink class to perform above action.
> I am using Linux machine having memory of 32 GB. 
> When i start FIleStreamSink , It is syncing to consumer location very very slowly. Not sure why it is taking 2000 message at a time and then sync it. After that it wait for few second then sync again. This waiting time increases per run .
>  
> I am processing 600K message but it took 1 hrs to process only 60K message.
>  
> Below are my config details : 
>  
> connect-file-sink.property
> Name = local-file
> Connector.class = FileStreamSource
> task.max=20
> file=/d/d1/kafka/destination/outfile.txt
> topic=abc_partion_20
> connect-file-source.property
> Name = local-file
> Connector.class = FileStreamSource
> task.max=20
> file=/d/d1/kafka/source/infile.txt
> topic=abc_partion_20
>  
> Can you please help ?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)