You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@kafka.apache.org by Randall Hauch <rh...@gmail.com> on 2017/09/10 18:12:44 UTC

[DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Hi all,

KIP-199 [1] describes a new tool that will allow Connect operators to read,
update, and remove offsets stored by Connect runtime. This capability has
been often asked for by Connect users. The proposal is simple but flexible.
Please review and add feedback.

Best regards,

Randall

[1]
https://cwiki.apache.org/confluence/display/KAFKA/KIP-199%3A+Add+Kafka+Connect+offset+reset+tool

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Gwen Shapira <gw...@confluent.io>.
Totally agree about getting things right in the first round. In addition, I
believe that it is important that names of our tools will describe what the
tools do as accurately as possible - this improves usability. Therefore, we
should shift the discussion to whether the tool should handle all offsets
or just source offsets. The name of the tool will naturally be derived from
the decision of what the tool will do.

Now that it is an important discussion about functionality and not
bike-shedding about names:

I am strongly supportive of duplicating functionality and having a Connect
tool that handles all aspects of resetting connector state. As you said,
the specifics of how Connect manages its offsets are an implementation
details that I don't expect users to know and care about.

I also want to point out that the Streams reset tool already duplicates
some functionality and this doesn't seem to be a problem because the
context (Streams) is well defined.

Gwen



On Tue, Sep 12, 2017 at 7:30 PM Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> I don't think it's bikeshedding and it is a fair question. I try to avoid
> that because this is part of the point of KIPs -- avoid extra pain around
> usability, documentation, maintenance, and compatibility and deprecation by
> just trying to get things right on the first go around.
>
> It's obviously just my preference, but I would rather name it what we
> ultimately want and document limitations while we fill in the gaps. This is
> a case where I really don't know whether we'll want to extend it to
> effectively duplicate some of the behavior of an existing tool for the sake
> of consistency, so I raised it for discussion.
>
> -Ewen
>
> On Tue, Sep 12, 2017 at 10:51 AM, Gwen Shapira <gw...@confluent.io> wrote:
>
> > >
> > >
> > > > * re: naming, can we avoid including 'source' in the command name?
> even
> > > if
> > > > that's all it supports today, I don't think we want to restrict it.
> > While
> > > > we only *require* this for source offsets, I think for users it will,
> > > > long-term, be way more natural to consider connect offsets generally
> > and
> > > > not need to know how to drop down to consumer offsets for sink
> > > connectors.
> > > > In fact, in an ideal world, many users/operators may not even
> > > *understand*
> > > > consumer offsets deeply while still being generally familiar with
> > connect
> > > > offsets. We can always include an error message for now if they try
> to
> > do
> > > > something with a sink connector (which we presumably need to detect
> > > anyway
> > > > to correctly handle source connectors).
> > > >
> > >
> > > ack
> > >
> > >
> > Sorry about bikeshedding, but:
> > I think a general name for something that has limited functionality is
> very
> > confusing (and we've seen this play out before). Why not name the tool to
> > describe what it does and change its name if we change the functionality?
> >
> > >
> > >
> >
>

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
I don't think it's bikeshedding and it is a fair question. I try to avoid
that because this is part of the point of KIPs -- avoid extra pain around
usability, documentation, maintenance, and compatibility and deprecation by
just trying to get things right on the first go around.

It's obviously just my preference, but I would rather name it what we
ultimately want and document limitations while we fill in the gaps. This is
a case where I really don't know whether we'll want to extend it to
effectively duplicate some of the behavior of an existing tool for the sake
of consistency, so I raised it for discussion.

-Ewen

On Tue, Sep 12, 2017 at 10:51 AM, Gwen Shapira <gw...@confluent.io> wrote:

> >
> >
> > > * re: naming, can we avoid including 'source' in the command name? even
> > if
> > > that's all it supports today, I don't think we want to restrict it.
> While
> > > we only *require* this for source offsets, I think for users it will,
> > > long-term, be way more natural to consider connect offsets generally
> and
> > > not need to know how to drop down to consumer offsets for sink
> > connectors.
> > > In fact, in an ideal world, many users/operators may not even
> > *understand*
> > > consumer offsets deeply while still being generally familiar with
> connect
> > > offsets. We can always include an error message for now if they try to
> do
> > > something with a sink connector (which we presumably need to detect
> > anyway
> > > to correctly handle source connectors).
> > >
> >
> > ack
> >
> >
> Sorry about bikeshedding, but:
> I think a general name for something that has limited functionality is very
> confusing (and we've seen this play out before). Why not name the tool to
> describe what it does and change its name if we change the functionality?
>
> >
> >
>

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Gwen Shapira <gw...@confluent.io>.
>
>
> > * re: naming, can we avoid including 'source' in the command name? even
> if
> > that's all it supports today, I don't think we want to restrict it. While
> > we only *require* this for source offsets, I think for users it will,
> > long-term, be way more natural to consider connect offsets generally and
> > not need to know how to drop down to consumer offsets for sink
> connectors.
> > In fact, in an ideal world, many users/operators may not even
> *understand*
> > consumer offsets deeply while still being generally familiar with connect
> > offsets. We can always include an error message for now if they try to do
> > something with a sink connector (which we presumably need to detect
> anyway
> > to correctly handle source connectors).
> >
>
> ack
>
>
Sorry about bikeshedding, but:
I think a general name for something that has limited functionality is very
confusing (and we've seen this play out before). Why not name the tool to
describe what it does and change its name if we change the functionality?

>
>

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Randall Hauch <rh...@gmail.com>.
Thanks for the comments. As mentioned in the VOTE thread, I've withdrawn
the vote since this proposal is obviously incomplete. I have updated the
KIP with your comments, and will discuss each inline below:

On Mon, Sep 11, 2017 at 11:30 PM, Ewen Cheslack-Postava <ew...@confluent.io>
wrote:

> A couple of comments:
>
> * I made some minor, non-critical updates to the motivation section to add
> a bit more color/background/clarity. In particular, clarifying how things
> are connected to consumer groups for sinks. Still, the motivation isn't
> entirely clear about how this works -- it is definitely a completely
> offline tool. Specifying that users stop a connector entirely, read/modify
> offsets, then restart, would probably be useful info for users of the
> feature.
>

ack


> * re: naming, can we avoid including 'source' in the command name? even if
> that's all it supports today, I don't think we want to restrict it. While
> we only *require* this for source offsets, I think for users it will,
> long-term, be way more natural to consider connect offsets generally and
> not need to know how to drop down to consumer offsets for sink connectors.
> In fact, in an ideal world, many users/operators may not even *understand*
> consumer offsets deeply while still being generally familiar with connect
> offsets. We can always include an error message for now if they try to do
> something with a sink connector (which we presumably need to detect anyway
> to correctly handle source connectors).
>

ack


> * re: naming of bin/ script, this is largely an artifact of my prototype
> and i don't think an explicit decision (my bad), but currently the connect
> commands don't have a kafka- prefix. I'm fine going either way, but if we
> standardize on kafka-connect-*, I would like to see a simple follow up to
> add the corresponding kafka-connect-standalone[.sh] and
> kafka-connect-distributed[.sh] commands to get things consistent, ideally
> with corresponding deprecations and plans for removal.
>

ack. Let's stick with "connect-offsets.sh" for now to follow the existing
convention.


> * Does export print all the offset commits currently available in the
> topic, or only the most recent? I am curious since this tool could
> potentially be useful for rewinding offsets if things get into a bad state,
> in which case printing all offsets still available could be useful. I think
> this should be a pretty safe extension since if you pipe all available
> offsets -> write offsets, you'll still just get the latest offset commit
> once the connector is restarted.
>

Addressed with a new option.


> * This should probably specify schemaless data. I know there were issues
> wrt internal converters that basically enforced this, but we should just be
> explicit about it moving forward.
>

I was hoping to just use the existing offset storage reader/writer
mechanism, but the OffsetStorageReader requires knowledge of the partition
information. Obviously we need to think more about this whole tool. One
option is to use a different implementation, or to enhance the existing
implementation class (not API) to allow doing this.

Anyway, if this is the way we go and we use the existing worker configs to
start the offset storage reader/writer, I'm not sure why schemaless would
matter.


> * nit, but i'd just change --connector-names -> --connectors. i don't think
> we use -names anywhere else yet and the natural plural seems... more
> natural to me
>

ack.


> * While we've talked about it for awhile in various KIP and pre-KIP
> discussions, standardizing on JSON input/output is new. I'm fine with it,
> but I do wonder if we should include a format flag by default or just add
> it as needed. All the "old" commands will need an explicit format flag if
> we want them to support JSON.
>

sure, we can discuss. Personally, I'd love to see YAML since that's less
verbose and easier to deal with, but that's not really an option since it
adds other complexity not to mention that nothing in Kafka uses YAML to my
knowledge.


> * can we get rid of --offsets-file? isn't this the same as piping stdout to
> a file? does it benefit us some other way i am missing? I was actually
> confused at first b/c I thought --offsets-file was referring to the
> standalone mode file to *read from* not the output JSON file to *write to*.
>

ack.


> * export output: definitely a nit, but do we need to repeat the offset in
> all the entries or should the top-level just be a map of connector -> list
> of (partition, offset)? overhead here is probably minimal so I don't feel
> strongly, but I don't see a good reason not to use a more efficient
> encoding.
>

ack. I agree nesting those related to a connector within a connector-level
doc is cleaner and probably easier to use. I considered that at first, but
then I thought that just having the pairs may be cleaner for representing
history. I'm not sure that's true, and the nested approach definitely seems
better. Another reason to pull the vote.


> * > and output only those partition-offset pairs for a named connector:
>  I am unclear whether the output file is compatible as a normal connect
> offsets file, is only for this tool, or something else
>

ack. It's not a normal connect output file. I thought making them the same
would couple this tool too much to the current file storage implementation.


> * mentioned in the first item, but since deleting offsets w/ a null value
> is mentioned at the end of the KIP, I'll reiterate that I think it is
> important to call out when this applies -- if done on a live connector,
> only a rebalance would guarantee it applies, and in the meantime something
> could overwrite. only safe way to apply any of this, including offset
> delete, is to stop the connector, apply operation, then continue.
>

ack.


>
> -Ewen
>
>
> On Sun, Sep 10, 2017 at 3:12 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. How about calling it kafka-connect-source-offset-tool.sh
> >
> > This is better.
> > Going over existing .sh files, some have verb in their names while some
> > don't.
> >
> > +1 from me.
> >
> > On Sun, Sep 10, 2017 at 2:53 PM, Randall Hauch <rh...@gmail.com> wrote:
> >
> > > Thanks for the comments! Specific comments inline below.
> > >
> > > Regards,
> > >
> > > Randall
> > >
> > > On Sun, Sep 10, 2017 at 2:42 PM, Ted Yu <yu...@gmail.com> wrote:
> > >
> > > > bq. connector restart and the next message
> > > >
> > > > The last part of the sentence seems to be incomplete.
> > > >
> > >
> > > Fixed.
> > >
> > >
> > > > bq. command line tool called kafka-connect-source-offset-reset.sh
> > > >
> > > > From the description, the tool does more than resetting (e.g.
> > deleting).
> > > > How about calling it kafka-connect-source-offset-tool.sh
> > > >
> > >
> > > None of the other tools or scripts in the Kafka distribution have
> "tool"
> > in
> > > the name, so how about "kafka-connect-source-offsets.sh"?
> > >
> > >
> > > >
> > > > bq. bin/kafka-connect-source-offset-reset.sh
> > > > --config=my-worker-config.properties --export
> > > >
> > > > From the table above, export mode is mentioned as required. However,
> > > either
> > > > --export or --import is required.
> > > > Better note this in the KIP.
> > > >
> > >
> > > Addressed in a few places. Hopefully it is more clear now.
> > >
> > >
> > > >
> > > > bq. but will remove the offsets for the partition with file "b"
> > > >
> > > > Please move the sample JSON below the above description.
> > > >
> > >
> > > Done.
> > >
> > >
> > > >
> > > > Cheers
> > > >
> > > > On Sun, Sep 10, 2017 at 11:12 AM, Randall Hauch <rh...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > KIP-199 [1] describes a new tool that will allow Connect operators
> to
> > > > read,
> > > > > update, and remove offsets stored by Connect runtime. This
> capability
> > > has
> > > > > been often asked for by Connect users. The proposal is simple but
> > > > flexible.
> > > > > Please review and add feedback.
> > > > >
> > > > > Best regards,
> > > > >
> > > > > Randall
> > > > >
> > > > > [1]
> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > > 199%3A+Add+Kafka+Connect+offset+reset+tool
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Ewen Cheslack-Postava <ew...@confluent.io>.
A couple of comments:

* I made some minor, non-critical updates to the motivation section to add
a bit more color/background/clarity. In particular, clarifying how things
are connected to consumer groups for sinks. Still, the motivation isn't
entirely clear about how this works -- it is definitely a completely
offline tool. Specifying that users stop a connector entirely, read/modify
offsets, then restart, would probably be useful info for users of the
feature.
* re: naming, can we avoid including 'source' in the command name? even if
that's all it supports today, I don't think we want to restrict it. While
we only *require* this for source offsets, I think for users it will,
long-term, be way more natural to consider connect offsets generally and
not need to know how to drop down to consumer offsets for sink connectors.
In fact, in an ideal world, many users/operators may not even *understand*
consumer offsets deeply while still being generally familiar with connect
offsets. We can always include an error message for now if they try to do
something with a sink connector (which we presumably need to detect anyway
to correctly handle source connectors).
* re: naming of bin/ script, this is largely an artifact of my prototype
and i don't think an explicit decision (my bad), but currently the connect
commands don't have a kafka- prefix. I'm fine going either way, but if we
standardize on kafka-connect-*, I would like to see a simple follow up to
add the corresponding kafka-connect-standalone[.sh] and
kafka-connect-distributed[.sh] commands to get things consistent, ideally
with corresponding deprecations and plans for removal.
* Does export print all the offset commits currently available in the
topic, or only the most recent? I am curious since this tool could
potentially be useful for rewinding offsets if things get into a bad state,
in which case printing all offsets still available could be useful. I think
this should be a pretty safe extension since if you pipe all available
offsets -> write offsets, you'll still just get the latest offset commit
once the connector is restarted.
* This should probably specify schemaless data. I know there were issues
wrt internal converters that basically enforced this, but we should just be
explicit about it moving forward.
* nit, but i'd just change --connector-names -> --connectors. i don't think
we use -names anywhere else yet and the natural plural seems... more
natural to me
* While we've talked about it for awhile in various KIP and pre-KIP
discussions, standardizing on JSON input/output is new. I'm fine with it,
but I do wonder if we should include a format flag by default or just add
it as needed. All the "old" commands will need an explicit format flag if
we want them to support JSON.
* can we get rid of --offsets-file? isn't this the same as piping stdout to
a file? does it benefit us some other way i am missing? I was actually
confused at first b/c I thought --offsets-file was referring to the
standalone mode file to *read from* not the output JSON file to *write to*.
* export output: definitely a nit, but do we need to repeat the offset in
all the entries or should the top-level just be a map of connector -> list
of (partition, offset)? overhead here is probably minimal so I don't feel
strongly, but I don't see a good reason not to use a more efficient
encoding.
* > and output only those partition-offset pairs for a named connector:
 I am unclear whether the output file is compatible as a normal connect
offsets file, is only for this tool, or something else
* mentioned in the first item, but since deleting offsets w/ a null value
is mentioned at the end of the KIP, I'll reiterate that I think it is
important to call out when this applies -- if done on a live connector,
only a rebalance would guarantee it applies, and in the meantime something
could overwrite. only safe way to apply any of this, including offset
delete, is to stop the connector, apply operation, then continue.

-Ewen


On Sun, Sep 10, 2017 at 3:12 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. How about calling it kafka-connect-source-offset-tool.sh
>
> This is better.
> Going over existing .sh files, some have verb in their names while some
> don't.
>
> +1 from me.
>
> On Sun, Sep 10, 2017 at 2:53 PM, Randall Hauch <rh...@gmail.com> wrote:
>
> > Thanks for the comments! Specific comments inline below.
> >
> > Regards,
> >
> > Randall
> >
> > On Sun, Sep 10, 2017 at 2:42 PM, Ted Yu <yu...@gmail.com> wrote:
> >
> > > bq. connector restart and the next message
> > >
> > > The last part of the sentence seems to be incomplete.
> > >
> >
> > Fixed.
> >
> >
> > > bq. command line tool called kafka-connect-source-offset-reset.sh
> > >
> > > From the description, the tool does more than resetting (e.g.
> deleting).
> > > How about calling it kafka-connect-source-offset-tool.sh
> > >
> >
> > None of the other tools or scripts in the Kafka distribution have "tool"
> in
> > the name, so how about "kafka-connect-source-offsets.sh"?
> >
> >
> > >
> > > bq. bin/kafka-connect-source-offset-reset.sh
> > > --config=my-worker-config.properties --export
> > >
> > > From the table above, export mode is mentioned as required. However,
> > either
> > > --export or --import is required.
> > > Better note this in the KIP.
> > >
> >
> > Addressed in a few places. Hopefully it is more clear now.
> >
> >
> > >
> > > bq. but will remove the offsets for the partition with file "b"
> > >
> > > Please move the sample JSON below the above description.
> > >
> >
> > Done.
> >
> >
> > >
> > > Cheers
> > >
> > > On Sun, Sep 10, 2017 at 11:12 AM, Randall Hauch <rh...@gmail.com>
> > wrote:
> > >
> > > > Hi all,
> > > >
> > > > KIP-199 [1] describes a new tool that will allow Connect operators to
> > > read,
> > > > update, and remove offsets stored by Connect runtime. This capability
> > has
> > > > been often asked for by Connect users. The proposal is simple but
> > > flexible.
> > > > Please review and add feedback.
> > > >
> > > > Best regards,
> > > >
> > > > Randall
> > > >
> > > > [1]
> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > > 199%3A+Add+Kafka+Connect+offset+reset+tool
> > > >
> > >
> >
>

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Ted Yu <yu...@gmail.com>.
bq. How about calling it kafka-connect-source-offset-tool.sh

This is better.
Going over existing .sh files, some have verb in their names while some
don't.

+1 from me.

On Sun, Sep 10, 2017 at 2:53 PM, Randall Hauch <rh...@gmail.com> wrote:

> Thanks for the comments! Specific comments inline below.
>
> Regards,
>
> Randall
>
> On Sun, Sep 10, 2017 at 2:42 PM, Ted Yu <yu...@gmail.com> wrote:
>
> > bq. connector restart and the next message
> >
> > The last part of the sentence seems to be incomplete.
> >
>
> Fixed.
>
>
> > bq. command line tool called kafka-connect-source-offset-reset.sh
> >
> > From the description, the tool does more than resetting (e.g. deleting).
> > How about calling it kafka-connect-source-offset-tool.sh
> >
>
> None of the other tools or scripts in the Kafka distribution have "tool" in
> the name, so how about "kafka-connect-source-offsets.sh"?
>
>
> >
> > bq. bin/kafka-connect-source-offset-reset.sh
> > --config=my-worker-config.properties --export
> >
> > From the table above, export mode is mentioned as required. However,
> either
> > --export or --import is required.
> > Better note this in the KIP.
> >
>
> Addressed in a few places. Hopefully it is more clear now.
>
>
> >
> > bq. but will remove the offsets for the partition with file "b"
> >
> > Please move the sample JSON below the above description.
> >
>
> Done.
>
>
> >
> > Cheers
> >
> > On Sun, Sep 10, 2017 at 11:12 AM, Randall Hauch <rh...@gmail.com>
> wrote:
> >
> > > Hi all,
> > >
> > > KIP-199 [1] describes a new tool that will allow Connect operators to
> > read,
> > > update, and remove offsets stored by Connect runtime. This capability
> has
> > > been often asked for by Connect users. The proposal is simple but
> > flexible.
> > > Please review and add feedback.
> > >
> > > Best regards,
> > >
> > > Randall
> > >
> > > [1]
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 199%3A+Add+Kafka+Connect+offset+reset+tool
> > >
> >
>

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Randall Hauch <rh...@gmail.com>.
Thanks for the comments! Specific comments inline below.

Regards,

Randall

On Sun, Sep 10, 2017 at 2:42 PM, Ted Yu <yu...@gmail.com> wrote:

> bq. connector restart and the next message
>
> The last part of the sentence seems to be incomplete.
>

Fixed.


> bq. command line tool called kafka-connect-source-offset-reset.sh
>
> From the description, the tool does more than resetting (e.g. deleting).
> How about calling it kafka-connect-source-offset-tool.sh
>

None of the other tools or scripts in the Kafka distribution have "tool" in
the name, so how about "kafka-connect-source-offsets.sh"?


>
> bq. bin/kafka-connect-source-offset-reset.sh
> --config=my-worker-config.properties --export
>
> From the table above, export mode is mentioned as required. However, either
> --export or --import is required.
> Better note this in the KIP.
>

Addressed in a few places. Hopefully it is more clear now.


>
> bq. but will remove the offsets for the partition with file "b"
>
> Please move the sample JSON below the above description.
>

Done.


>
> Cheers
>
> On Sun, Sep 10, 2017 at 11:12 AM, Randall Hauch <rh...@gmail.com> wrote:
>
> > Hi all,
> >
> > KIP-199 [1] describes a new tool that will allow Connect operators to
> read,
> > update, and remove offsets stored by Connect runtime. This capability has
> > been often asked for by Connect users. The proposal is simple but
> flexible.
> > Please review and add feedback.
> >
> > Best regards,
> >
> > Randall
> >
> > [1]
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 199%3A+Add+Kafka+Connect+offset+reset+tool
> >
>

Re: [DISCUSS] KIP-199: Add Kafka Connect offset reset tool

Posted by Ted Yu <yu...@gmail.com>.
bq. connector restart and the next message

The last part of the sentence seems to be incomplete.

bq. command line tool called kafka-connect-source-offset-reset.sh

From the description, the tool does more than resetting (e.g. deleting).
How about calling it kafka-connect-source-offset-tool.sh

bq. bin/kafka-connect-source-offset-reset.sh
--config=my-worker-config.properties --export

From the table above, export mode is mentioned as required. However, either
--export or --import is required.
Better note this in the KIP.

bq. but will remove the offsets for the partition with file "b"

Please move the sample JSON below the above description.

Cheers

On Sun, Sep 10, 2017 at 11:12 AM, Randall Hauch <rh...@gmail.com> wrote:

> Hi all,
>
> KIP-199 [1] describes a new tool that will allow Connect operators to read,
> update, and remove offsets stored by Connect runtime. This capability has
> been often asked for by Connect users. The proposal is simple but flexible.
> Please review and add feedback.
>
> Best regards,
>
> Randall
>
> [1]
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 199%3A+Add+Kafka+Connect+offset+reset+tool
>