You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apex.apache.org by Siyuan Hua <si...@datatorrent.com> on 2015/11/16 19:53:31 UTC

Support new major kafka release 0.9.0

I will be working on rewriting the kafka input operator.

Here is the ticket
https://malhar.atlassian.net/browse/MLHR-1904

Here is some comments on the ticket

The RC2 is out here
https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/

We will keep most features of the old input operator but the internal
mechanism will be changed, for example, using new API to refresh the
metadata
The bugs that will be fixed:

   - Synchronized offset checkpoint
   - Transient offsetmanager

New features:

   - Support customized partition schema
   - Default OffsetManager using new

Improvement

   - Add window id and application name to OffsetManager interface
   - Support multi-topic
   - Easy configuration


Please leave thoughts here or on the ticket. Thanks

Best,
Siyuan

Re: Support new major kafka release 0.9.0

Posted by amol kekre <am...@gmail.com>.
We do need the Kafka operator to be congnizant of Kafka versions. I am
assuming that is part of the rewrite. I added my comments to the jira.

Thks,
Amol


On Tue, Nov 24, 2015 at 4:43 PM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> Nice.
>
> On Tue, Nov 24, 2015 at 4:32 PM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
> > The new API provides for explicit offset storage on the broker side (see
> > commitXXX methods):
> >
> >
> >
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
> >
> > With this, we won't need the current offset manager approach.
> >
> > On Tue, Nov 24, 2015 at 1:32 PM, Pramod Immaneni <pramod@datatorrent.com
> >
> > wrote:
> >
> > > What support does the new API provide for offset management?
> > >
> > > On Tue, Nov 24, 2015 at 1:09 PM, Thomas Weise <th...@datatorrent.com>
> > > wrote:
> > >
> > > > This discussion applies to Kafka 0.8.x only?
> > > >
> > > > With the new consumer API, offset management can be delegated, we
> won't
> > > > need this component any longer. Each partition can record the offset
> > for
> > > > the committed window then.
> > > >
> > > > Thomas
> > > >
> > > > On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <siyuan@datatorrent.com
> >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > I need your idea for the design of OffsetManager for kafka input
> > > operator
> > > > >
> > > > > First of all, some background of OffsetManager and why we may need
> > it.
> > > > The
> > > > > OffsetManager is a plugin in kafka input operator for cutomized
> > offset
> > > > > management. The API will be called if the consumer offset(the
> message
> > > > that
> > > > > has been emitted, along with the window id) changes.
> > > > > OffsetManager is different from offset checkpointing, we still use
> > > > > checkpointed offset to recover node from failure.
> > > > >
> > > > > 2 reasons for the need of OffsetManager:
> > > > >   1) User might want to store offsets in their own way (hdfs,
> > > zookeeper,
> > > > > database, etc)
> > > > >   2) User might want to continue consuming at application restart.
> > > > >
> > > > > In the current version, the OffsetManager works in a central mode,
> > each
> > > > > partition only reports the offset(s) to Statslistener, the listener
> > > calls
> > > > > OffsetManager to update the offsets.
> > > > > The other possibility is make the OffsetManager work in a
> distributed
> > > > > mode.  Each partition update the offset(s) on its own.
> > > > >
> > > > > The distributed mode is more straightforward, but the developer
> needs
> > > to
> > > > > know it's distributed, you have to manage write from multiple
> nodes,
> > > > > collisions on your own. But also it's more real time at no risk of
> > > > failure
> > > > > of stats reporter
> > > > >
> > > > > Any input is welcome, thanks!
> > > > >
> > > > > Regards,
> > > > > Siyuan
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <
> siyuan@datatorrent.com
> > >
> > > > > wrote:
> > > > >
> > > > > > I will be working on rewriting the kafka input operator.
> > > > > >
> > > > > > Here is the ticket
> > > > > > https://malhar.atlassian.net/browse/MLHR-1904
> > > > > >
> > > > > > Here is some comments on the ticket
> > > > > >
> > > > > > The RC2 is out here
> > > > > > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/
> > > > > >
> > > > > > We will keep most features of the old input operator but the
> > internal
> > > > > > mechanism will be changed, for example, using new API to refresh
> > the
> > > > > > metadata
> > > > > > The bugs that will be fixed:
> > > > > >
> > > > > >    - Synchronized offset checkpoint
> > > > > >    - Transient offsetmanager
> > > > > >
> > > > > > New features:
> > > > > >
> > > > > >    - Support customized partition schema
> > > > > >    - Default OffsetManager using new
> > > > > >
> > > > > > Improvement
> > > > > >
> > > > > >    - Add window id and application name to OffsetManager
> interface
> > > > > >    - Support multi-topic
> > > > > >    - Easy configuration
> > > > > >
> > > > > >
> > > > > > Please leave thoughts here or on the ticket. Thanks
> > > > > >
> > > > > > Best,
> > > > > > Siyuan
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Support new major kafka release 0.9.0

Posted by Pramod Immaneni <pr...@datatorrent.com>.
Nice.

On Tue, Nov 24, 2015 at 4:32 PM, Thomas Weise <th...@datatorrent.com>
wrote:

> The new API provides for explicit offset storage on the broker side (see
> commitXXX methods):
>
>
> http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html
>
> With this, we won't need the current offset manager approach.
>
> On Tue, Nov 24, 2015 at 1:32 PM, Pramod Immaneni <pr...@datatorrent.com>
> wrote:
>
> > What support does the new API provide for offset management?
> >
> > On Tue, Nov 24, 2015 at 1:09 PM, Thomas Weise <th...@datatorrent.com>
> > wrote:
> >
> > > This discussion applies to Kafka 0.8.x only?
> > >
> > > With the new consumer API, offset management can be delegated, we won't
> > > need this component any longer. Each partition can record the offset
> for
> > > the committed window then.
> > >
> > > Thomas
> > >
> > > On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <si...@datatorrent.com>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > I need your idea for the design of OffsetManager for kafka input
> > operator
> > > >
> > > > First of all, some background of OffsetManager and why we may need
> it.
> > > The
> > > > OffsetManager is a plugin in kafka input operator for cutomized
> offset
> > > > management. The API will be called if the consumer offset(the message
> > > that
> > > > has been emitted, along with the window id) changes.
> > > > OffsetManager is different from offset checkpointing, we still use
> > > > checkpointed offset to recover node from failure.
> > > >
> > > > 2 reasons for the need of OffsetManager:
> > > >   1) User might want to store offsets in their own way (hdfs,
> > zookeeper,
> > > > database, etc)
> > > >   2) User might want to continue consuming at application restart.
> > > >
> > > > In the current version, the OffsetManager works in a central mode,
> each
> > > > partition only reports the offset(s) to Statslistener, the listener
> > calls
> > > > OffsetManager to update the offsets.
> > > > The other possibility is make the OffsetManager work in a distributed
> > > > mode.  Each partition update the offset(s) on its own.
> > > >
> > > > The distributed mode is more straightforward, but the developer needs
> > to
> > > > know it's distributed, you have to manage write from multiple nodes,
> > > > collisions on your own. But also it's more real time at no risk of
> > > failure
> > > > of stats reporter
> > > >
> > > > Any input is welcome, thanks!
> > > >
> > > > Regards,
> > > > Siyuan
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <siyuan@datatorrent.com
> >
> > > > wrote:
> > > >
> > > > > I will be working on rewriting the kafka input operator.
> > > > >
> > > > > Here is the ticket
> > > > > https://malhar.atlassian.net/browse/MLHR-1904
> > > > >
> > > > > Here is some comments on the ticket
> > > > >
> > > > > The RC2 is out here
> > > > > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/
> > > > >
> > > > > We will keep most features of the old input operator but the
> internal
> > > > > mechanism will be changed, for example, using new API to refresh
> the
> > > > > metadata
> > > > > The bugs that will be fixed:
> > > > >
> > > > >    - Synchronized offset checkpoint
> > > > >    - Transient offsetmanager
> > > > >
> > > > > New features:
> > > > >
> > > > >    - Support customized partition schema
> > > > >    - Default OffsetManager using new
> > > > >
> > > > > Improvement
> > > > >
> > > > >    - Add window id and application name to OffsetManager interface
> > > > >    - Support multi-topic
> > > > >    - Easy configuration
> > > > >
> > > > >
> > > > > Please leave thoughts here or on the ticket. Thanks
> > > > >
> > > > > Best,
> > > > > Siyuan
> > > > >
> > > >
> > >
> >
>

Re: Support new major kafka release 0.9.0

Posted by Thomas Weise <th...@datatorrent.com>.
The new API provides for explicit offset storage on the broker side (see
commitXXX methods):

http://kafka.apache.org/090/javadoc/index.html?org/apache/kafka/clients/consumer/KafkaConsumer.html

With this, we won't need the current offset manager approach.

On Tue, Nov 24, 2015 at 1:32 PM, Pramod Immaneni <pr...@datatorrent.com>
wrote:

> What support does the new API provide for offset management?
>
> On Tue, Nov 24, 2015 at 1:09 PM, Thomas Weise <th...@datatorrent.com>
> wrote:
>
> > This discussion applies to Kafka 0.8.x only?
> >
> > With the new consumer API, offset management can be delegated, we won't
> > need this component any longer. Each partition can record the offset for
> > the committed window then.
> >
> > Thomas
> >
> > On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <si...@datatorrent.com>
> > wrote:
> >
> > > Hi all,
> > >
> > > I need your idea for the design of OffsetManager for kafka input
> operator
> > >
> > > First of all, some background of OffsetManager and why we may need it.
> > The
> > > OffsetManager is a plugin in kafka input operator for cutomized offset
> > > management. The API will be called if the consumer offset(the message
> > that
> > > has been emitted, along with the window id) changes.
> > > OffsetManager is different from offset checkpointing, we still use
> > > checkpointed offset to recover node from failure.
> > >
> > > 2 reasons for the need of OffsetManager:
> > >   1) User might want to store offsets in their own way (hdfs,
> zookeeper,
> > > database, etc)
> > >   2) User might want to continue consuming at application restart.
> > >
> > > In the current version, the OffsetManager works in a central mode, each
> > > partition only reports the offset(s) to Statslistener, the listener
> calls
> > > OffsetManager to update the offsets.
> > > The other possibility is make the OffsetManager work in a distributed
> > > mode.  Each partition update the offset(s) on its own.
> > >
> > > The distributed mode is more straightforward, but the developer needs
> to
> > > know it's distributed, you have to manage write from multiple nodes,
> > > collisions on your own. But also it's more real time at no risk of
> > failure
> > > of stats reporter
> > >
> > > Any input is welcome, thanks!
> > >
> > > Regards,
> > > Siyuan
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <si...@datatorrent.com>
> > > wrote:
> > >
> > > > I will be working on rewriting the kafka input operator.
> > > >
> > > > Here is the ticket
> > > > https://malhar.atlassian.net/browse/MLHR-1904
> > > >
> > > > Here is some comments on the ticket
> > > >
> > > > The RC2 is out here
> > > > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/
> > > >
> > > > We will keep most features of the old input operator but the internal
> > > > mechanism will be changed, for example, using new API to refresh the
> > > > metadata
> > > > The bugs that will be fixed:
> > > >
> > > >    - Synchronized offset checkpoint
> > > >    - Transient offsetmanager
> > > >
> > > > New features:
> > > >
> > > >    - Support customized partition schema
> > > >    - Default OffsetManager using new
> > > >
> > > > Improvement
> > > >
> > > >    - Add window id and application name to OffsetManager interface
> > > >    - Support multi-topic
> > > >    - Easy configuration
> > > >
> > > >
> > > > Please leave thoughts here or on the ticket. Thanks
> > > >
> > > > Best,
> > > > Siyuan
> > > >
> > >
> >
>

Re: Support new major kafka release 0.9.0

Posted by Pramod Immaneni <pr...@datatorrent.com>.
What support does the new API provide for offset management?

On Tue, Nov 24, 2015 at 1:09 PM, Thomas Weise <th...@datatorrent.com>
wrote:

> This discussion applies to Kafka 0.8.x only?
>
> With the new consumer API, offset management can be delegated, we won't
> need this component any longer. Each partition can record the offset for
> the committed window then.
>
> Thomas
>
> On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <si...@datatorrent.com>
> wrote:
>
> > Hi all,
> >
> > I need your idea for the design of OffsetManager for kafka input operator
> >
> > First of all, some background of OffsetManager and why we may need it.
> The
> > OffsetManager is a plugin in kafka input operator for cutomized offset
> > management. The API will be called if the consumer offset(the message
> that
> > has been emitted, along with the window id) changes.
> > OffsetManager is different from offset checkpointing, we still use
> > checkpointed offset to recover node from failure.
> >
> > 2 reasons for the need of OffsetManager:
> >   1) User might want to store offsets in their own way (hdfs, zookeeper,
> > database, etc)
> >   2) User might want to continue consuming at application restart.
> >
> > In the current version, the OffsetManager works in a central mode, each
> > partition only reports the offset(s) to Statslistener, the listener calls
> > OffsetManager to update the offsets.
> > The other possibility is make the OffsetManager work in a distributed
> > mode.  Each partition update the offset(s) on its own.
> >
> > The distributed mode is more straightforward, but the developer needs to
> > know it's distributed, you have to manage write from multiple nodes,
> > collisions on your own. But also it's more real time at no risk of
> failure
> > of stats reporter
> >
> > Any input is welcome, thanks!
> >
> > Regards,
> > Siyuan
> >
> >
> >
> >
> >
> > On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <si...@datatorrent.com>
> > wrote:
> >
> > > I will be working on rewriting the kafka input operator.
> > >
> > > Here is the ticket
> > > https://malhar.atlassian.net/browse/MLHR-1904
> > >
> > > Here is some comments on the ticket
> > >
> > > The RC2 is out here
> > > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/
> > >
> > > We will keep most features of the old input operator but the internal
> > > mechanism will be changed, for example, using new API to refresh the
> > > metadata
> > > The bugs that will be fixed:
> > >
> > >    - Synchronized offset checkpoint
> > >    - Transient offsetmanager
> > >
> > > New features:
> > >
> > >    - Support customized partition schema
> > >    - Default OffsetManager using new
> > >
> > > Improvement
> > >
> > >    - Add window id and application name to OffsetManager interface
> > >    - Support multi-topic
> > >    - Easy configuration
> > >
> > >
> > > Please leave thoughts here or on the ticket. Thanks
> > >
> > > Best,
> > > Siyuan
> > >
> >
>

Re: Support new major kafka release 0.9.0

Posted by Thomas Weise <th...@datatorrent.com>.
This discussion applies to Kafka 0.8.x only?

With the new consumer API, offset management can be delegated, we won't
need this component any longer. Each partition can record the offset for
the committed window then.

Thomas

On Tue, Nov 24, 2015 at 12:59 PM, Siyuan Hua <si...@datatorrent.com> wrote:

> Hi all,
>
> I need your idea for the design of OffsetManager for kafka input operator
>
> First of all, some background of OffsetManager and why we may need it. The
> OffsetManager is a plugin in kafka input operator for cutomized offset
> management. The API will be called if the consumer offset(the message that
> has been emitted, along with the window id) changes.
> OffsetManager is different from offset checkpointing, we still use
> checkpointed offset to recover node from failure.
>
> 2 reasons for the need of OffsetManager:
>   1) User might want to store offsets in their own way (hdfs, zookeeper,
> database, etc)
>   2) User might want to continue consuming at application restart.
>
> In the current version, the OffsetManager works in a central mode, each
> partition only reports the offset(s) to Statslistener, the listener calls
> OffsetManager to update the offsets.
> The other possibility is make the OffsetManager work in a distributed
> mode.  Each partition update the offset(s) on its own.
>
> The distributed mode is more straightforward, but the developer needs to
> know it's distributed, you have to manage write from multiple nodes,
> collisions on your own. But also it's more real time at no risk of failure
> of stats reporter
>
> Any input is welcome, thanks!
>
> Regards,
> Siyuan
>
>
>
>
>
> On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <si...@datatorrent.com>
> wrote:
>
> > I will be working on rewriting the kafka input operator.
> >
> > Here is the ticket
> > https://malhar.atlassian.net/browse/MLHR-1904
> >
> > Here is some comments on the ticket
> >
> > The RC2 is out here
> > https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/
> >
> > We will keep most features of the old input operator but the internal
> > mechanism will be changed, for example, using new API to refresh the
> > metadata
> > The bugs that will be fixed:
> >
> >    - Synchronized offset checkpoint
> >    - Transient offsetmanager
> >
> > New features:
> >
> >    - Support customized partition schema
> >    - Default OffsetManager using new
> >
> > Improvement
> >
> >    - Add window id and application name to OffsetManager interface
> >    - Support multi-topic
> >    - Easy configuration
> >
> >
> > Please leave thoughts here or on the ticket. Thanks
> >
> > Best,
> > Siyuan
> >
>

Re: Support new major kafka release 0.9.0

Posted by Siyuan Hua <si...@datatorrent.com>.
Hi all,

I need your idea for the design of OffsetManager for kafka input operator

First of all, some background of OffsetManager and why we may need it. The
OffsetManager is a plugin in kafka input operator for cutomized offset
management. The API will be called if the consumer offset(the message that
has been emitted, along with the window id) changes.
OffsetManager is different from offset checkpointing, we still use
checkpointed offset to recover node from failure.

2 reasons for the need of OffsetManager:
  1) User might want to store offsets in their own way (hdfs, zookeeper,
database, etc)
  2) User might want to continue consuming at application restart.

In the current version, the OffsetManager works in a central mode, each
partition only reports the offset(s) to Statslistener, the listener calls
OffsetManager to update the offsets.
The other possibility is make the OffsetManager work in a distributed
mode.  Each partition update the offset(s) on its own.

The distributed mode is more straightforward, but the developer needs to
know it's distributed, you have to manage write from multiple nodes,
collisions on your own. But also it's more real time at no risk of failure
of stats reporter

Any input is welcome, thanks!

Regards,
Siyuan





On Mon, Nov 16, 2015 at 10:53 AM, Siyuan Hua <si...@datatorrent.com> wrote:

> I will be working on rewriting the kafka input operator.
>
> Here is the ticket
> https://malhar.atlassian.net/browse/MLHR-1904
>
> Here is some comments on the ticket
>
> The RC2 is out here
> https://people.apache.org/~junrao/kafka-0.9.0.0-candidate2/
>
> We will keep most features of the old input operator but the internal
> mechanism will be changed, for example, using new API to refresh the
> metadata
> The bugs that will be fixed:
>
>    - Synchronized offset checkpoint
>    - Transient offsetmanager
>
> New features:
>
>    - Support customized partition schema
>    - Default OffsetManager using new
>
> Improvement
>
>    - Add window id and application name to OffsetManager interface
>    - Support multi-topic
>    - Easy configuration
>
>
> Please leave thoughts here or on the ticket. Thanks
>
> Best,
> Siyuan
>