You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Himanshu Shukla <hi...@gmail.com> on 2021/04/01 04:25:43 UTC

Kafka Connect Distributed Mode Issues

Hi,
I am using kafka-connect-file-pulse connector and scanning around 20K
files. After the scan step, the whole connect cluster is becoming
unresponsive. I can not even access localhost:8083/connectors/ URL. It is
giving request timeout.


I have observed the below errors from the connect logs. Did anyone face
this issue?

Please advise if I am doing something wrong.


[2021-03-31 16:21:58,920] INFO Scanning local file system directory
'/apps/datafiles_1/cm_dir/QA1/'
(io.streamthoughts.kafka.connect.filepulse.scanner.LocalFileSystemScanner:241)
[2021-03-31 16:22:57,586] WARN [Worker clientId=connect-1,
groupId=connect-cluster] This member will leave the group because consumer
poll timeout has expired. This means the time between subsequent calls to
poll() was longer than the configured max.poll.interval.ms, which typically
implies that the poll loop is spending too much time processing messages.
You can address this either by increasing max.poll.interval.ms or by
reducing the maximum size of batches returned in poll() with
max.poll.records.
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1051)
[2021-03-31 16:22:57,586] INFO [Worker clientId=connect-1,
groupId=connect-cluster] *Member
connect-1-064cf0bf-b834-40d2-9e72-e61b229157c4 sending LeaveGroup request
to coordinator URL:9092* (id: 2147483646 rack: null)
(org.apache.kafka.clients.consumer.internals.AbstractCoordinator:822)
[2021-03-31 16:23:24,562] ERROR Request to leader to reconfigure connector
tasks failed
(org.apache.kafka.connect.runtime.distributed.DistributedHerder:1037)


*org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Request
timed out*
        at
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
        at
org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)
[2021-03-31 16:23:24,562] ERROR* Failed to reconfigure connector's tasks,
retrying after backoff:
(org.apache.kafka.connect.runtime.distributed.DistributedHerder:958)*
org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Request
timed out
        at
org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
        at
org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51

-- 
Regards,
Himanshu Shukla

Re: Kafka Connect Distributed Mode Issues

Posted by Himanshu Shukla <hi...@gmail.com>.

Did anyone face it before? The connector URL is giving 500 request time out.

On Thu, Apr 1, 2021 at 9:55 AM Himanshu Shukla <hi...@gmail.com>
wrote:

> Hi,
> I am using kafka-connect-file-pulse connector and scanning around 20K
> files. After the scan step, the whole connect cluster is becoming
> unresponsive. I can not even access localhost:8083/connectors/ URL. It is
> giving request timeout.
>
>
> I have observed the below errors from the connect logs. Did anyone face
> this issue?
>
> Please advise if I am doing something wrong.
>
>
> [2021-03-31 16:21:58,920] INFO Scanning local file system directory
> '/apps/datafiles_1/cm_dir/QA1/'
> (io.streamthoughts.kafka.connect.filepulse.scanner.LocalFileSystemScanner:241)
> [2021-03-31 16:22:57,586] WARN [Worker clientId=connect-1,
> groupId=connect-cluster] This member will leave the group because consumer
> poll timeout has expired. This means the time between subsequent calls to
> poll() was longer than the configured max.poll.interval.ms, which
> typically implies that the poll loop is spending too much time processing
> messages. You can address this either by increasing max.poll.interval.ms
> or by reducing the maximum size of batches returned in poll() with
> max.poll.records.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1051)
> [2021-03-31 16:22:57,586] INFO [Worker clientId=connect-1,
> groupId=connect-cluster] *Member
> connect-1-064cf0bf-b834-40d2-9e72-e61b229157c4 sending LeaveGroup request
> to coordinator URL:9092* (id: 2147483646 rack: null)
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:822)
> [2021-03-31 16:23:24,562] ERROR Request to leader to reconfigure connector
> tasks failed
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1037)
>
>
> *org.apache.kafka.connect.runtime.rest.errors.ConnectRestException:
> Request timed out*
>         at
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
>         at
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> [2021-03-31 16:23:24,562] ERROR* Failed to reconfigure connector's tasks,
> retrying after backoff:
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:958)*
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Request
> timed out
>         at
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
>         at
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
>
> --
> Regards,
> Himanshu Shukla
>
>
>

-- 
Regards,
Himanshu Shukla

Re: Kafka Connect Distributed Mode Issues

Posted by Liam Clarke-Hutchinson <li...@adscale.co.nz>.

Yeah, looks like it's an issue with the plugin. I don't have any experience
of it, sorry.

On Tue, 6 Apr. 2021, 12:32 am Himanshu Shukla, <hi...@gmail.com>
wrote:

> bootstrap.servers=b-1:9092,b-2:9092
> group.id=connect-cluster
> key.converter=org.apache.kafka.connect.json.JsonConverter
> value.converter=org.apache.kafka.connect.json.JsonConverter
> key.converter.schemas.enable=true
> value.converter.schemas.enable=true
> offset.storage.topic=connect-offsets-2
> offset.storage.replication.factor=2
> #offset.storage.partitions=25
> config.storage.topic=connect-configs-2
> config.storage.replication.factor=2
> status.storage.topic=connect-status-2
> status.storage.replication.factor=2
> #status.storage.partitions=5
> offset.flush.interval.ms=10000
> #rest.host.name=
> #rest.port=8083
> rest.advertised.host.name=172.16.234.122
> rest.advertised.port=8083
> plugin.path=/apps/libs/streamthoughts-kafka-connect-file-pulse-1.6.0/
> consumer.max.poll.records=100
> consumer.max.poll.interval.ms=600000
>
>
> These are the connect-distributed.properties, I am using. I have changed
> the last two fields but still having the same issue.
>
> is it related to the source connecter(file pulse in my case). It is
> scanning around 20K files and stuck. With smaller scan directory, it is
> properly running.
>
> On Mon, Apr 5, 2021 at 2:52 PM Liam Clarke-Hutchinson <
> liam.clarke@adscale.co.nz> wrote:
>
> > Hi Himanshu,
> >
> > Have you adjusted your consumer properties as the error message
> suggested?
> >
> > Alternatively reduce your your consumer.max.poll.records in the worker
> > config.
> >
> > Basically, the sink you're using is spending too much time processing in
> > the poll loop, so either tweak the properties as mentioned in the error
> > message, or reduce the number of records processed in a batch so that it
> > doesn't hit that time out.
> >
> > If you have adjusted these properties, and still have issues, please
> > respond with your current worker properties to make it easier to debug.
> >
> > Please note that for any KC sink or source connector, response times from
> > the underlying data source/store can impact performance, so you may also
> > need to look into that aspect also.
> >
> > Cheers,
> >
> > Liam Clarke-Hutchinson
> >
> > On Thu, 1 Apr. 2021, 5:26 pm Himanshu Shukla, <
> himanshushukla254@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > > I am using kafka-connect-file-pulse connector and scanning around 20K
> > > files. After the scan step, the whole connect cluster is becoming
> > > unresponsive. I can not even access localhost:8083/connectors/ URL. It
> is
> > > giving request timeout.
> > >
> > >
> > > I have observed the below errors from the connect logs. Did anyone face
> > > this issue?
> > >
> > > Please advise if I am doing something wrong.
> > >
> > >
> > > [2021-03-31 16:21:58,920] INFO Scanning local file system directory
> > > '/apps/datafiles_1/cm_dir/QA1/'
> > >
> > >
> >
> (io.streamthoughts.kafka.connect.filepulse.scanner.LocalFileSystemScanner:241)
> > > [2021-03-31 16:22:57,586] WARN [Worker clientId=connect-1,
> > > groupId=connect-cluster] This member will leave the group because
> > consumer
> > > poll timeout has expired. This means the time between subsequent calls
> to
> > > poll() was longer than the configured max.poll.interval.ms, which
> > > typically
> > > implies that the poll loop is spending too much time processing
> messages.
> > > You can address this either by increasing max.poll.interval.ms or by
> > > reducing the maximum size of batches returned in poll() with
> > > max.poll.records.
> > > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1051)
> > > [2021-03-31 16:22:57,586] INFO [Worker clientId=connect-1,
> > > groupId=connect-cluster] *Member
> > > connect-1-064cf0bf-b834-40d2-9e72-e61b229157c4 sending LeaveGroup
> request
> > > to coordinator URL:9092* (id: 2147483646 rack: null)
> > > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:822)
> > > [2021-03-31 16:23:24,562] ERROR Request to leader to reconfigure
> > connector
> > > tasks failed
> > > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1037)
> > >
> > >
> > > *org.apache.kafka.connect.runtime.rest.errors.ConnectRestException:
> > Request
> > > timed out*
> > >         at
> > >
> > >
> >
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
> > >         at
> > >
> > >
> >
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> > >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > >         at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> > >         at
> > >
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> > >         at java.lang.Thread.run(Thread.java:748)
> > > [2021-03-31 16:23:24,562] ERROR* Failed to reconfigure connector's
> tasks,
> > > retrying after backoff:
> > > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:958)*
> > > org.apache.kafka.connect.runtime.rest.errors.ConnectRestException:
> > Request
> > > timed out
> > >         at
> > >
> > >
> >
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
> > >         at
> > >
> > >
> >
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
> > >         at
> > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
> > >
> > > --
> > > Regards,
> > > Himanshu Shukla
> > >
> >
>
>
> --
> Regards,
> Himanshu Shukla
>

Re: Kafka Connect Distributed Mode Issues

Posted by Himanshu Shukla <hi...@gmail.com>.

bootstrap.servers=b-1:9092,b-2:9092
group.id=connect-cluster
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=true
value.converter.schemas.enable=true
offset.storage.topic=connect-offsets-2
offset.storage.replication.factor=2
#offset.storage.partitions=25
config.storage.topic=connect-configs-2
config.storage.replication.factor=2
status.storage.topic=connect-status-2
status.storage.replication.factor=2
#status.storage.partitions=5
offset.flush.interval.ms=10000
#rest.host.name=
#rest.port=8083
rest.advertised.host.name=172.16.234.122
rest.advertised.port=8083
plugin.path=/apps/libs/streamthoughts-kafka-connect-file-pulse-1.6.0/
consumer.max.poll.records=100
consumer.max.poll.interval.ms=600000


These are the connect-distributed.properties, I am using. I have changed
the last two fields but still having the same issue.

is it related to the source connecter(file pulse in my case). It is
scanning around 20K files and stuck. With smaller scan directory, it is
properly running.

On Mon, Apr 5, 2021 at 2:52 PM Liam Clarke-Hutchinson <
liam.clarke@adscale.co.nz> wrote:

> Hi Himanshu,
>
> Have you adjusted your consumer properties as the error message suggested?
>
> Alternatively reduce your your consumer.max.poll.records in the worker
> config.
>
> Basically, the sink you're using is spending too much time processing in
> the poll loop, so either tweak the properties as mentioned in the error
> message, or reduce the number of records processed in a batch so that it
> doesn't hit that time out.
>
> If you have adjusted these properties, and still have issues, please
> respond with your current worker properties to make it easier to debug.
>
> Please note that for any KC sink or source connector, response times from
> the underlying data source/store can impact performance, so you may also
> need to look into that aspect also.
>
> Cheers,
>
> Liam Clarke-Hutchinson
>
> On Thu, 1 Apr. 2021, 5:26 pm Himanshu Shukla, <himanshushukla254@gmail.com
> >
> wrote:
>
> > Hi,
> > I am using kafka-connect-file-pulse connector and scanning around 20K
> > files. After the scan step, the whole connect cluster is becoming
> > unresponsive. I can not even access localhost:8083/connectors/ URL. It is
> > giving request timeout.
> >
> >
> > I have observed the below errors from the connect logs. Did anyone face
> > this issue?
> >
> > Please advise if I am doing something wrong.
> >
> >
> > [2021-03-31 16:21:58,920] INFO Scanning local file system directory
> > '/apps/datafiles_1/cm_dir/QA1/'
> >
> >
> (io.streamthoughts.kafka.connect.filepulse.scanner.LocalFileSystemScanner:241)
> > [2021-03-31 16:22:57,586] WARN [Worker clientId=connect-1,
> > groupId=connect-cluster] This member will leave the group because
> consumer
> > poll timeout has expired. This means the time between subsequent calls to
> > poll() was longer than the configured max.poll.interval.ms, which
> > typically
> > implies that the poll loop is spending too much time processing messages.
> > You can address this either by increasing max.poll.interval.ms or by
> > reducing the maximum size of batches returned in poll() with
> > max.poll.records.
> > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1051)
> > [2021-03-31 16:22:57,586] INFO [Worker clientId=connect-1,
> > groupId=connect-cluster] *Member
> > connect-1-064cf0bf-b834-40d2-9e72-e61b229157c4 sending LeaveGroup request
> > to coordinator URL:9092* (id: 2147483646 rack: null)
> > (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:822)
> > [2021-03-31 16:23:24,562] ERROR Request to leader to reconfigure
> connector
> > tasks failed
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1037)
> >
> >
> > *org.apache.kafka.connect.runtime.rest.errors.ConnectRestException:
> Request
> > timed out*
> >         at
> >
> >
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
> >         at
> >
> >
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> >         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> >         at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> >         at java.lang.Thread.run(Thread.java:748)
> > [2021-03-31 16:23:24,562] ERROR* Failed to reconfigure connector's tasks,
> > retrying after backoff:
> > (org.apache.kafka.connect.runtime.distributed.DistributedHerder:958)*
> > org.apache.kafka.connect.runtime.rest.errors.ConnectRestException:
> Request
> > timed out
> >         at
> >
> >
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
> >         at
> >
> >
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
> >         at
> > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
> >
> > --
> > Regards,
> > Himanshu Shukla
> >
>


-- 
Regards,
Himanshu Shukla

Re: Kafka Connect Distributed Mode Issues

Posted by Liam Clarke-Hutchinson <li...@adscale.co.nz>.

Hi Himanshu,

Have you adjusted your consumer properties as the error message suggested?

Alternatively reduce your your consumer.max.poll.records in the worker
config.

Basically, the sink you're using is spending too much time processing in
the poll loop, so either tweak the properties as mentioned in the error
message, or reduce the number of records processed in a batch so that it
doesn't hit that time out.

If you have adjusted these properties, and still have issues, please
respond with your current worker properties to make it easier to debug.

Please note that for any KC sink or source connector, response times from
the underlying data source/store can impact performance, so you may also
need to look into that aspect also.

Cheers,

Liam Clarke-Hutchinson

On Thu, 1 Apr. 2021, 5:26 pm Himanshu Shukla, <hi...@gmail.com>
wrote:

> Hi,
> I am using kafka-connect-file-pulse connector and scanning around 20K
> files. After the scan step, the whole connect cluster is becoming
> unresponsive. I can not even access localhost:8083/connectors/ URL. It is
> giving request timeout.
>
>
> I have observed the below errors from the connect logs. Did anyone face
> this issue?
>
> Please advise if I am doing something wrong.
>
>
> [2021-03-31 16:21:58,920] INFO Scanning local file system directory
> '/apps/datafiles_1/cm_dir/QA1/'
>
> (io.streamthoughts.kafka.connect.filepulse.scanner.LocalFileSystemScanner:241)
> [2021-03-31 16:22:57,586] WARN [Worker clientId=connect-1,
> groupId=connect-cluster] This member will leave the group because consumer
> poll timeout has expired. This means the time between subsequent calls to
> poll() was longer than the configured max.poll.interval.ms, which
> typically
> implies that the poll loop is spending too much time processing messages.
> You can address this either by increasing max.poll.interval.ms or by
> reducing the maximum size of batches returned in poll() with
> max.poll.records.
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:1051)
> [2021-03-31 16:22:57,586] INFO [Worker clientId=connect-1,
> groupId=connect-cluster] *Member
> connect-1-064cf0bf-b834-40d2-9e72-e61b229157c4 sending LeaveGroup request
> to coordinator URL:9092* (id: 2147483646 rack: null)
> (org.apache.kafka.clients.consumer.internals.AbstractCoordinator:822)
> [2021-03-31 16:23:24,562] ERROR Request to leader to reconfigure connector
> tasks failed
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:1037)
>
>
> *org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Request
> timed out*
>         at
>
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
>         at
>
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>         at
>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>         at java.lang.Thread.run(Thread.java:748)
> [2021-03-31 16:23:24,562] ERROR* Failed to reconfigure connector's tasks,
> retrying after backoff:
> (org.apache.kafka.connect.runtime.distributed.DistributedHerder:958)*
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Request
> timed out
>         at
>
> org.apache.kafka.connect.runtime.rest.RestClient.httpRequest(RestClient.java:97)
>         at
>
> org.apache.kafka.connect.runtime.distributed.DistributedHerder$18.run(DistributedHerder.java:1034)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:51
>
> --
> Regards,
> Himanshu Shukla
>