You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nifi.apache.org by "Adams, Jeremiah" <je...@pearson.com> on 2015/07/27 23:42:37 UTC

GetKafka Processor and Hardcoded Kafka Consumer Configs

The GetKafka processor has a couple of  Kafka Consumer Config values that
are hard-coded.

   props.setProperty("auto.commit.enable", "true"); // just be explicit
    props.setProperty("auto.offset.reset", "smallest");

These should be configurable property values in the Processor.  Most
notable for me is the "auto.offset.reset". Smallest vs. Largest has some
implications concerning fault tolerance strategies.

It would be best to expose all of the available Kafka Consumer Config
properties. If these change though between kafka versions it would create
maintenance work for the Processors.

Another option would be to allow ad-hoc property values and end-user just
supply the kafka config values they want to override.


*Jeremiah Adams*

Senior Software Developer
Pearson

2154 East Commons Ave.
Suite 400
Centennial, CO 80122


Always Learning
Learn more at www.pearson.com

Re: GetKafka Processor and Hardcoded Kafka Consumer Configs

Posted by "Adams, Jeremiah" <je...@pearson.com>.
I missed the kafka.offset attribute. That is what I needed. Thanks Mark.

*Jeremiah Adams*

Senior Software Developer
Pearson

2154 East Commons Ave.
Suite 400
Centennial, CO 80122


Always Learning
Learn more at www.pearson.com

On Tue, Jul 28, 2015 at 9:28 AM, Mark Payne <ma...@hotmail.com> wrote:

> Jeremiah,
>
> Totally understand now. We can certainly add a property that indicates
> whether or not to commit the offsets.
> We should probably also document (at a very high level) the use-case that
> you are describing as an example
> of why you may want to not commit the offsets. I will update the ticket to
> include this.
>
> Regarding the separate enhancement: when you say "the last written offset"
> are you referring to when GetKafka
> writes the offset to ZooKeeper? I do not believe that information is
> exposed by their "High-level consumer."
> It's probably possible if we were to change to the "simple consumer" API,
> but that interface is extremely different
> so it unfortunately isn't a simple change.
>
> The FlowFiles that are received, though, do have a "kafka.offset"
> attribute, which indicates the offset of that individual
> message, if that helps?
>
> Thanks
> -Mark
>
>
> ----------------------------------------
> > Date: Tue, 28 Jul 2015 08:56:21 -0600
> > Subject: Re: GetKafka Processor and Hardcoded Kafka Consumer Configs
> > From: jeremiah.adams@pearson.com
> > To: dev@nifi.apache.org
> >
> > In the case of auto.commit.enable - we had a scenario during our last
> > deploy in which we did not commit the offsets we read at all. This
> > atypical. This is in the case of a Lambda-like architecture in which we
> use
> > S3 to provide historical data to repopulate the near real-time datastore
> > during a deploy.
> >
> > Mostly, I think that the user experience would be better if we had
> complete
> > control over the GetKafka Processor config here:
> > http://kafka.apache.org/documentation.html#consumerconfigs.
> > There may be implementation details that make it impossible, but it would
> > be the best case. I think it is probably safe to say the same about the
> > Kafka Producer - but I have not run into any blockers as-is. I have added
> > this to the jira ticket.
> >
> > Also, a separate enhancement:
> >
> > I see a need to pass along the last written offset to subsequent
> Processors
> > in a flow. I don't know if this is even possible, I didn't look that
> > closely at the code. It could be useful If it were possible to have the
> > option to pass the last Offset along the flow as metadata. We could then
> > pass around FlowFile data indexed by last Offset. Dunno if this is worth
> > exploring as it may be unique to our architecture.
> >
> >
> > *Jeremiah Adams*
> >
> > Senior Software Developer
> > Pearson
> >
> > 2154 East Commons Ave.
> > Suite 400
> > Centennial, CO 80122
> >
> >
> > Always Learning
> > Learn more at www.pearson.com
> >
> > On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <ma...@hotmail.com>
> wrote:
> >
> >> Jeremiah,
> >>
> >> We can certainly enable the "auto.offset.reset" to be configurable. Not
> >> sure how making the "auto.commit.enable" configurable would work.
> >> Are you thinking that another property would be added to indicate how
> >> often to commit? Or would it work completely differently? Just need that
> >> fleshed out a bit more.
> >>
> >> I do like the suggestion of exposing the config properties as
> user-defined
> >> properties.
> >>
> >> I have created a ticket to track this information:
> >> https://issues.apache.org/jira/browse/NIFI-791
> >>
> >> Please feel free to update the ticket with any relevant information as
> you
> >> think of it.
> >>
> >> Thanks!
> >> -Mark
> >>
> >> ----------------------------------------
> >>> Date: Mon, 27 Jul 2015 15:42:37 -0600
> >>> Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs
> >>> From: jeremiah.adams@pearson.com
> >>> To: dev@nifi.apache.org
> >>>
> >>> The GetKafka processor has a couple of Kafka Consumer Config values
> that
> >>> are hard-coded.
> >>>
> >>> props.setProperty("auto.commit.enable", "true"); // just be explicit
> >>> props.setProperty("auto.offset.reset", "smallest");
> >>>
> >>> These should be configurable property values in the Processor. Most
> >>> notable for me is the "auto.offset.reset". Smallest vs. Largest has
> some
> >>> implications concerning fault tolerance strategies.
> >>>
> >>> It would be best to expose all of the available Kafka Consumer Config
> >>> properties. If these change though between kafka versions it would
> create
> >>> maintenance work for the Processors.
> >>>
> >>> Another option would be to allow ad-hoc property values and end-user
> just
> >>> supply the kafka config values they want to override.
> >>>
> >>>
> >>> *Jeremiah Adams*
> >>>
> >>> Senior Software Developer
> >>> Pearson
> >>>
> >>> 2154 East Commons Ave.
> >>> Suite 400
> >>> Centennial, CO 80122
> >>>
> >>>
> >>> Always Learning
> >>> Learn more at www.pearson.com
> >>
>

RE: GetKafka Processor and Hardcoded Kafka Consumer Configs

Posted by Mark Payne <ma...@hotmail.com>.
Jeremiah,

Totally understand now. We can certainly add a property that indicates whether or not to commit the offsets.
We should probably also document (at a very high level) the use-case that you are describing as an example
of why you may want to not commit the offsets. I will update the ticket to include this.

Regarding the separate enhancement: when you say "the last written offset" are you referring to when GetKafka
writes the offset to ZooKeeper? I do not believe that information is exposed by their "High-level consumer."
It's probably possible if we were to change to the "simple consumer" API, but that interface is extremely different
so it unfortunately isn't a simple change.

The FlowFiles that are received, though, do have a "kafka.offset" attribute, which indicates the offset of that individual
message, if that helps?

Thanks
-Mark


----------------------------------------
> Date: Tue, 28 Jul 2015 08:56:21 -0600
> Subject: Re: GetKafka Processor and Hardcoded Kafka Consumer Configs
> From: jeremiah.adams@pearson.com
> To: dev@nifi.apache.org
>
> In the case of auto.commit.enable - we had a scenario during our last
> deploy in which we did not commit the offsets we read at all. This
> atypical. This is in the case of a Lambda-like architecture in which we use
> S3 to provide historical data to repopulate the near real-time datastore
> during a deploy.
>
> Mostly, I think that the user experience would be better if we had complete
> control over the GetKafka Processor config here:
> http://kafka.apache.org/documentation.html#consumerconfigs.
> There may be implementation details that make it impossible, but it would
> be the best case. I think it is probably safe to say the same about the
> Kafka Producer - but I have not run into any blockers as-is. I have added
> this to the jira ticket.
>
> Also, a separate enhancement:
>
> I see a need to pass along the last written offset to subsequent Processors
> in a flow. I don't know if this is even possible, I didn't look that
> closely at the code. It could be useful If it were possible to have the
> option to pass the last Offset along the flow as metadata. We could then
> pass around FlowFile data indexed by last Offset. Dunno if this is worth
> exploring as it may be unique to our architecture.
>
>
> *Jeremiah Adams*
>
> Senior Software Developer
> Pearson
>
> 2154 East Commons Ave.
> Suite 400
> Centennial, CO 80122
>
>
> Always Learning
> Learn more at www.pearson.com
>
> On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <ma...@hotmail.com> wrote:
>
>> Jeremiah,
>>
>> We can certainly enable the "auto.offset.reset" to be configurable. Not
>> sure how making the "auto.commit.enable" configurable would work.
>> Are you thinking that another property would be added to indicate how
>> often to commit? Or would it work completely differently? Just need that
>> fleshed out a bit more.
>>
>> I do like the suggestion of exposing the config properties as user-defined
>> properties.
>>
>> I have created a ticket to track this information:
>> https://issues.apache.org/jira/browse/NIFI-791
>>
>> Please feel free to update the ticket with any relevant information as you
>> think of it.
>>
>> Thanks!
>> -Mark
>>
>> ----------------------------------------
>>> Date: Mon, 27 Jul 2015 15:42:37 -0600
>>> Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs
>>> From: jeremiah.adams@pearson.com
>>> To: dev@nifi.apache.org
>>>
>>> The GetKafka processor has a couple of Kafka Consumer Config values that
>>> are hard-coded.
>>>
>>> props.setProperty("auto.commit.enable", "true"); // just be explicit
>>> props.setProperty("auto.offset.reset", "smallest");
>>>
>>> These should be configurable property values in the Processor. Most
>>> notable for me is the "auto.offset.reset". Smallest vs. Largest has some
>>> implications concerning fault tolerance strategies.
>>>
>>> It would be best to expose all of the available Kafka Consumer Config
>>> properties. If these change though between kafka versions it would create
>>> maintenance work for the Processors.
>>>
>>> Another option would be to allow ad-hoc property values and end-user just
>>> supply the kafka config values they want to override.
>>>
>>>
>>> *Jeremiah Adams*
>>>
>>> Senior Software Developer
>>> Pearson
>>>
>>> 2154 East Commons Ave.
>>> Suite 400
>>> Centennial, CO 80122
>>>
>>>
>>> Always Learning
>>> Learn more at www.pearson.com
>>
 		 	   		  

Re: GetKafka Processor and Hardcoded Kafka Consumer Configs

Posted by "Adams, Jeremiah" <je...@pearson.com>.
In the case of auto.commit.enable - we had a scenario during our last
deploy in which we did not commit the offsets we read at all. This
atypical. This is in the case of a Lambda-like architecture in which we use
S3 to provide historical data to repopulate the near real-time datastore
during a deploy.

Mostly, I think that the user experience would be better if we had complete
control over the GetKafka Processor config  here:
http://kafka.apache.org/documentation.html#consumerconfigs.
There may be implementation details that make it impossible, but it would
be the best case. I think it is probably safe to say the same about the
Kafka Producer - but I have not run into any blockers as-is. I have added
this to the jira ticket.

Also, a separate enhancement:

I see a need to pass along the last written offset to subsequent Processors
in a flow. I don't know if this is even possible, I didn't look that
closely at the code. It could be useful If it were possible to have the
option to pass the last Offset along the flow as metadata. We could then
pass around FlowFile data indexed by last Offset. Dunno if this is worth
exploring as it may be unique to our architecture.


*Jeremiah Adams*

Senior Software Developer
Pearson

2154 East Commons Ave.
Suite 400
Centennial, CO 80122


Always Learning
Learn more at www.pearson.com

On Mon, Jul 27, 2015 at 6:14 PM, Mark Payne <ma...@hotmail.com> wrote:

> Jeremiah,
>
> We can certainly enable the "auto.offset.reset" to be configurable. Not
> sure how making the "auto.commit.enable" configurable would work.
> Are you thinking that another property would be added to indicate how
> often to commit? Or would it work completely differently? Just need that
> fleshed out a bit more.
>
> I do like the suggestion of exposing the config properties as user-defined
> properties.
>
> I have created a ticket to track this information:
> https://issues.apache.org/jira/browse/NIFI-791
>
> Please feel free to update the ticket with any relevant information as you
> think of it.
>
> Thanks!
> -Mark
>
> ----------------------------------------
> > Date: Mon, 27 Jul 2015 15:42:37 -0600
> > Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs
> > From: jeremiah.adams@pearson.com
> > To: dev@nifi.apache.org
> >
> > The GetKafka processor has a couple of Kafka Consumer Config values that
> > are hard-coded.
> >
> > props.setProperty("auto.commit.enable", "true"); // just be explicit
> > props.setProperty("auto.offset.reset", "smallest");
> >
> > These should be configurable property values in the Processor. Most
> > notable for me is the "auto.offset.reset". Smallest vs. Largest has some
> > implications concerning fault tolerance strategies.
> >
> > It would be best to expose all of the available Kafka Consumer Config
> > properties. If these change though between kafka versions it would create
> > maintenance work for the Processors.
> >
> > Another option would be to allow ad-hoc property values and end-user just
> > supply the kafka config values they want to override.
> >
> >
> > *Jeremiah Adams*
> >
> > Senior Software Developer
> > Pearson
> >
> > 2154 East Commons Ave.
> > Suite 400
> > Centennial, CO 80122
> >
> >
> > Always Learning
> > Learn more at www.pearson.com
>

RE: GetKafka Processor and Hardcoded Kafka Consumer Configs

Posted by Mark Payne <ma...@hotmail.com>.
Jeremiah,

We can certainly enable the "auto.offset.reset" to be configurable. Not sure how making the "auto.commit.enable" configurable would work.
Are you thinking that another property would be added to indicate how often to commit? Or would it work completely differently? Just need that
fleshed out a bit more.

I do like the suggestion of exposing the config properties as user-defined properties. 

I have created a ticket to track this information: https://issues.apache.org/jira/browse/NIFI-791

Please feel free to update the ticket with any relevant information as you think of it.

Thanks!
-Mark

----------------------------------------
> Date: Mon, 27 Jul 2015 15:42:37 -0600
> Subject: GetKafka Processor and Hardcoded Kafka Consumer Configs
> From: jeremiah.adams@pearson.com
> To: dev@nifi.apache.org
>
> The GetKafka processor has a couple of Kafka Consumer Config values that
> are hard-coded.
>
> props.setProperty("auto.commit.enable", "true"); // just be explicit
> props.setProperty("auto.offset.reset", "smallest");
>
> These should be configurable property values in the Processor. Most
> notable for me is the "auto.offset.reset". Smallest vs. Largest has some
> implications concerning fault tolerance strategies.
>
> It would be best to expose all of the available Kafka Consumer Config
> properties. If these change though between kafka versions it would create
> maintenance work for the Processors.
>
> Another option would be to allow ad-hoc property values and end-user just
> supply the kafka config values they want to override.
>
>
> *Jeremiah Adams*
>
> Senior Software Developer
> Pearson
>
> 2154 East Commons Ave.
> Suite 400
> Centennial, CO 80122
>
>
> Always Learning
> Learn more at www.pearson.com