You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Per Steffensen <pe...@gmail.com> on 2017/05/16 09:57:46 UTC

Kafka Connect: To much restarting with a SourceConnector with dynamic set of tasks

Hi

Kafka (Connect) 0.10.2.1

I am writing my own SourceConnector. It will communicate with a remote 
server, and continuously calculate the set of tasks that has to be 
running. Each task also makes a connection to the remote server from 
which it will get its data to forward to Kafka.

When the SourceConnector realizes that the set of tasks has to be 
modified, it makes sure taskConfigs-method will return config for the 
new complete set of tasks (likely including tasks that already existed 
before, probably some new tasks, and maybe some of the existing tasks 
will no longer be included). After that the SourceConnector calls 
context.requestTaskReconfiguration. This results in the current instance 
of my SourceConnector and all existing/running tasks gets stopped, a new 
instance of my SourceConnector gets created and all tasks (those that 
existed before and new ones) are started.

It all works nicely, but because my SourceConnector and my SourceTasks 
has to (re)establish connection and (re)initialize the streaming of 
data, and because my set of tasks changes fairly often, and because it 
very very often contains tasks that were also in the set before the 
change, I end up having lots of stop/start of tasks that really just 
ought to continue running.

Any plans on making this more delta-ish, so that when doing a 
requestTaskReconfiguration
* Only tasks that were not already in the task-config-set before the 
requestTaskConfiguration are started
* Only tasks that were in the task-config-set before the 
requestTaskConfiguration, but not in the set after, are stopped
* Tasks that are in the task-config-set both before and after 
requestTaskConfiguration, are just allowed to keep running, without 
restarting
* Not so important: Do not create a new instance of the SourceConnector, 
just because it has a changed task-config-set

Or am I doing something wrong in my SourceConnector? Are there a 
different way that I should maintain a dynamic set of tasks?

Thanks!!!

Regards, Per Steffensen


Re: Kafka Connect: To much restarting with a SourceConnector with dynamic set of tasks

Posted by Per Steffensen <pe...@gmail.com>.
Thanks a lot for responding, Randall! See my comments below.

Regards, Per Steffensen

On 22/05/17 22:36, Randall Hauch wrote:
> You're not doing anything wrong, but I suspect you're requesting task 
> reconfiguration more frequently than was originally envisioned, which 
> means that the current implementation is not as optimal for your case.
OK thanks for confirming
>
> I'm not sure how much effort is required to implement this new 
> behavior. The logic for the standalone worker is pretty 
> straightforward, but the logic for the distributed worker is going to 
> be much more involved.
Yeah, when I realized the "problem" I had a short look at the code to 
see if it was easily fixable. I never went deep into it, but it seem 
like more than just an hour of work.
> But we also need to be careful about changing existing behavior, since 
> it's not hard to imagine connectors that might expect that all tasks 
> be restarted when there are any changes to the task configurations.
FWIW, I think it is a little hard to imagine :-)
> If there's any potential that this is the case, we'd have to be sure 
> to keep the existing behavior as the default but to somehow enable the 
> new behavior if desired.
I definitely agree!
>
> One possibility is to add an overloaded 
> requestTaskReconfiguration(boolean changedOnly) that specifies whether 
> only changed tasks should be reconfigured. This way the existing 
> requestTaskReconfiguration() method could be changed to call 
> requestTaskReconfiguration(false), and then the implementation has to 
> deal with this.
Yep, or make is a optional standard-configuration that you can always 
give a connector. Potato, potato
>
> But again, the bigger challenge is to implement this new behavior in 
> the DistributedHerder. OTOH, perhaps it's not as complicated as I 
> might guess.
Well I would really like to see it happen. Anyone up for it? Am I 
allowed to create a ticket on this?
What if I would like to give it a shot myself. Is there a committer that 
would help review and eventually commit?
Which branch should I make a PR to?

Re: Kafka Connect: To much restarting with a SourceConnector with dynamic set of tasks

Posted by Randall Hauch <rh...@gmail.com>.
You're not doing anything wrong, but I suspect you're requesting task
reconfiguration more frequently than was originally envisioned, which means
that the current implementation is not as optimal for your case.

I'm not sure how much effort is required to implement this new behavior.
The logic for the standalone worker is pretty straightforward, but the
logic for the distributed worker is going to be much more involved. But we
also need to be careful about changing existing behavior, since it's not
hard to imagine connectors that might expect that all tasks be restarted
when there are any changes to the task configurations. If there's any
potential that this is the case, we'd have to be sure to keep the existing
behavior as the default but to somehow enable the new behavior if desired.

One possibility is to add an overloaded requestTaskReconfiguration(boolean
changedOnly) that specifies whether only changed tasks should be
reconfigured. This way the existing requestTaskReconfiguration() method
could be changed to call requestTaskReconfiguration(false), and then the
implementation has to deal with this.

But again, the bigger challenge is to implement this new behavior in the
DistributedHerder. OTOH, perhaps it's not as complicated as I might guess.



On Tue, May 16, 2017 at 4:57 AM, Per Steffensen <pe...@gmail.com> wrote:

> Hi
>
> Kafka (Connect) 0.10.2.1
>
> I am writing my own SourceConnector. It will communicate with a remote
> server, and continuously calculate the set of tasks that has to be running.
> Each task also makes a connection to the remote server from which it will
> get its data to forward to Kafka.
>
> When the SourceConnector realizes that the set of tasks has to be
> modified, it makes sure taskConfigs-method will return config for the new
> complete set of tasks (likely including tasks that already existed before,
> probably some new tasks, and maybe some of the existing tasks will no
> longer be included). After that the SourceConnector calls
> context.requestTaskReconfiguration. This results in the current instance
> of my SourceConnector and all existing/running tasks gets stopped, a new
> instance of my SourceConnector gets created and all tasks (those that
> existed before and new ones) are started.
>
> It all works nicely, but because my SourceConnector and my SourceTasks has
> to (re)establish connection and (re)initialize the streaming of data, and
> because my set of tasks changes fairly often, and because it very very
> often contains tasks that were also in the set before the change, I end up
> having lots of stop/start of tasks that really just ought to continue
> running.
>
> Any plans on making this more delta-ish, so that when doing a
> requestTaskReconfiguration
> * Only tasks that were not already in the task-config-set before the
> requestTaskConfiguration are started
> * Only tasks that were in the task-config-set before the
> requestTaskConfiguration, but not in the set after, are stopped
> * Tasks that are in the task-config-set both before and after
> requestTaskConfiguration, are just allowed to keep running, without
> restarting
> * Not so important: Do not create a new instance of the SourceConnector,
> just because it has a changed task-config-set
>
> Or am I doing something wrong in my SourceConnector? Are there a different
> way that I should maintain a dynamic set of tasks?
>
> Thanks!!!
>
> Regards, Per Steffensen
>
>