You are viewing a plain text version of this content. The canonical link for it is here.

Posted to users@kafka.apache.org by Edward Smith <es...@stardotstar.org> on 2012/04/13 19:04:52 UTC

Consumer State Description in design.html

I didn't want to open up a bug unless there was some concurrence on
this.   Please review the change below and see if I'm just
misunderstanding things or not.  This paragraph in the doc took me a
long time to digest because it was describing the contrib/hadoop
consumer and not how simpleconsumer or consoleconsumer work:

Consumer State (the second heading like this in the file)

In Kafka, the consumers are responsible for maintaining state
information on what has been consumed.  The core Kafka consumers write
their state data to zookeeper.

However, it may be beneficial for consumers to write state data into
the same datastore where they are writing the results of their
processing.  For example, the consumer may simply be entering some
aggregate value into a centralized...... (rest of section remains the
same from here)

Ed

On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com> wrote:
> Currently, as you are iterating messages returned by SimpleConsumer, you
> also get the offset for the next message. In the map, you can just run for
> 30 mins and save the next offset for the next run.
>
> Thanks,
>
> Jun
>
> On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com> wrote:
>
>> Hi ,
>>
>> I looked at hadoop-consumer , which fetches data directly from the kafka
>> broker . But from what i understand it is based on min and max offset and
>> map task complete once they reach the maximum offset for a given topic .
>>
>> In our use case we would not know about the max offset before hand. Instead
>> we want map to keep reading data from a min offset and roll over every 30
>> mins . At 30th min we would again generate the offsets which would be used
>> for the next run.
>>
>> any suggestions would be helpful .
>>
>> regards,
>> rks
>>

Re: Consumer State Description in design.html

Posted by Jun Rao <ju...@gmail.com>.

Got it. Updated the website in svn. It should be reflected externally soon.

Thanks,

Jun

On Fri, Apr 13, 2012 at 12:01 PM, Edward Smith <es...@stardotstar.org>wrote:

> Ack!  No!  I'm sorry, I'm probably just confusing the issue.  I just
> want to clarify the docs, not change the functionality.
>
> Maybe I'll try to sum it up the way I would write the jira:
>
> "Design.html is confusing to new users when it comes to where offset
> data is stored by consumers."
>
>
> On Fri, Apr 13, 2012 at 2:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > Ed,
> >
> > It seems that you are proposing a pluggable consumer offset store. We
> don't
> > have that now. Could you open a jira for that?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esmith@stardotstar.org
> >wrote:
> >
> >> Jun,
> >>
> >> Let me try to rephrase to see if I can get this point across more
> clearly.
> >>
> >> I've been exploring the design by running the console tools.  The
> >> console consumer stores offset data in ZK.   This appears to be the
> >> default behavior in a Kafka deployment.  For example, if you skip down
> >> to "Consumers and Consumer Groups", it says that offsets are stored in
> >> ZK.
> >>
> >> This paragraph that I want to change, is basically describing an
> >> alternative technique of tracking offsets.  It has been confusing to
> >> me as I've tried to understand the design of Kafka, so I want to see
> >> if we can clarify it somehow.
> >>
> >> Ed
> >>
> >> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <ju...@gmail.com> wrote:
> >> > Ed,
> >> >
> >> > The design page only describes how the high level consumer (which most
> >> > people use) works. The high level consumer currently doesn't expose
> >> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is
> >> not
> >> > described. We can have a wiki describing it and put your content
> there.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <
> esmith@stardotstar.org
> >> >wrote:
> >> >
> >> >> Sorry.... here it is with more clarity:
> >> >>
> >> >> Basically I'm adding to the beginning of the 2nd section titled
> >> "Consumer
> >> >> State"
> >> >>
> >> >> ----------------------------------------
> >> >> <h3>Consumer State</h3> (the second heading like this in the file)
> >> >> <p>
> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> information on what has been  consumed.  The core Kafka consumers
> >> >> write their state data to zookeeper.
> >> >> </p>
> >> >> <p>
> >> >> However, it may be beneficial for consumers to write state data into
> >> >> the same datastore where they are writing the results of their
> >> >> processing.  For example, the consumer may simply be entering some
> >> >> aggregate value into a centralized......
> >> >> ..
> >> >> (rest of section remains the same from here)
> >> >> ..
> >> >> </p>
> >> >> ------------------------------------------
> >> >>
> >> >>
> >> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <ju...@gmail.com> wrote:
> >> >> > Ed,
> >> >> >
> >> >> > I don't see the change you want to make. Apache mailing list
> doesn't
> >> take
> >> >> > attachments. If you have attachments, the easiest way is probably
> to
> >> >> attach
> >> >> > that to a jira.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jun
> >> >> >
> >> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <
> >> esmith@stardotstar.org
> >> >> >wrote:
> >> >> >
> >> >> >> I didn't want to open up a bug unless there was some concurrence
> on
> >> >> >> this.   Please review the change below and see if I'm just
> >> >> >> misunderstanding things or not.  This paragraph in the doc took
> me a
> >> >> >> long time to digest because it was describing the contrib/hadoop
> >> >> >> consumer and not how simpleconsumer or consoleconsumer work:
> >> >> >>
> >> >> >> Consumer State (the second heading like this in the file)
> >> >> >>
> >> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> >> information on what has been consumed.  The core Kafka consumers
> >> write
> >> >> >> their state data to zookeeper.
> >> >> >>
> >> >> >> However, it may be beneficial for consumers to write state data
> into
> >> >> >> the same datastore where they are writing the results of their
> >> >> >> processing.  For example, the consumer may simply be entering some
> >> >> >> aggregate value into a centralized...... (rest of section remains
> the
> >> >> >> same from here)
> >> >> >>
> >> >> >> Ed
> >> >> >>
> >> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com>
> wrote:
> >> >> >> > Currently, as you are iterating messages returned by
> >> SimpleConsumer,
> >> >> you
> >> >> >> > also get the offset for the next message. In the map, you can
> just
> >> run
> >> >> >> for
> >> >> >> > 30 mins and save the next offset for the next run.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> >
> >> >> >> > Jun
> >> >> >> >
> >> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com>
> >> wrote:
> >> >> >> >
> >> >> >> >> Hi ,
> >> >> >> >>
> >> >> >> >> I looked at hadoop-consumer , which fetches data directly from
> the
> >> >> kafka
> >> >> >> >> broker . But from what i understand it is based on min and max
> >> offset
> >> >> >> and
> >> >> >> >> map task complete once they reach the maximum offset for a
> given
> >> >> topic .
> >> >> >> >>
> >> >> >> >> In our use case we would not know about the max offset before
> >> hand.
> >> >> >> Instead
> >> >> >> >> we want map to keep reading data from a min offset and roll
> over
> >> >> every
> >> >> >> 30
> >> >> >> >> mins . At 30th min we would again generate the offsets which
> >> would be
> >> >> >> used
> >> >> >> >> for the next run.
> >> >> >> >>
> >> >> >> >> any suggestions would be helpful .
> >> >> >> >>
> >> >> >> >> regards,
> >> >> >> >> rks
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: Consumer State Description in design.html

Posted by Joel Koshy <jj...@gmail.com>.

Jun, I think Ed is suggesting a good improvement to the design doc: line
203 on
http://svn.apache.org/viewvc/incubator/kafka/site/design.html?view=markup

That paragraph does seem to mix up the discussion between the high-level
consumer and consumers that maintain their own state for more fine-grained
"rewindability". At least, the first two lines of that paragraph seem to be
talking about the high-level consumer, but not very clearly.

Thanks,

Joel

On Fri, Apr 13, 2012 at 12:01 PM, Edward Smith <es...@stardotstar.org>wrote:

> Ack!  No!  I'm sorry, I'm probably just confusing the issue.  I just
> want to clarify the docs, not change the functionality.
>
> Maybe I'll try to sum it up the way I would write the jira:
>
> "Design.html is confusing to new users when it comes to where offset
> data is stored by consumers."
>
>
> On Fri, Apr 13, 2012 at 2:48 PM, Jun Rao <ju...@gmail.com> wrote:
> > Ed,
> >
> > It seems that you are proposing a pluggable consumer offset store. We
> don't
> > have that now. Could you open a jira for that?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <esmith@stardotstar.org
> >wrote:
> >
> >> Jun,
> >>
> >> Let me try to rephrase to see if I can get this point across more
> clearly.
> >>
> >> I've been exploring the design by running the console tools.  The
> >> console consumer stores offset data in ZK.   This appears to be the
> >> default behavior in a Kafka deployment.  For example, if you skip down
> >> to "Consumers and Consumer Groups", it says that offsets are stored in
> >> ZK.
> >>
> >> This paragraph that I want to change, is basically describing an
> >> alternative technique of tracking offsets.  It has been confusing to
> >> me as I've tried to understand the design of Kafka, so I want to see
> >> if we can clarify it somehow.
> >>
> >> Ed
> >>
> >> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <ju...@gmail.com> wrote:
> >> > Ed,
> >> >
> >> > The design page only describes how the high level consumer (which most
> >> > people use) works. The high level consumer currently doesn't expose
> >> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is
> >> not
> >> > described. We can have a wiki describing it and put your content
> there.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <
> esmith@stardotstar.org
> >> >wrote:
> >> >
> >> >> Sorry.... here it is with more clarity:
> >> >>
> >> >> Basically I'm adding to the beginning of the 2nd section titled
> >> "Consumer
> >> >> State"
> >> >>
> >> >> ----------------------------------------
> >> >> <h3>Consumer State</h3> (the second heading like this in the file)
> >> >> <p>
> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> information on what has been  consumed.  The core Kafka consumers
> >> >> write their state data to zookeeper.
> >> >> </p>
> >> >> <p>
> >> >> However, it may be beneficial for consumers to write state data into
> >> >> the same datastore where they are writing the results of their
> >> >> processing.  For example, the consumer may simply be entering some
> >> >> aggregate value into a centralized......
> >> >> ..
> >> >> (rest of section remains the same from here)
> >> >> ..
> >> >> </p>
> >> >> ------------------------------------------
> >> >>
> >> >>
> >> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <ju...@gmail.com> wrote:
> >> >> > Ed,
> >> >> >
> >> >> > I don't see the change you want to make. Apache mailing list
> doesn't
> >> take
> >> >> > attachments. If you have attachments, the easiest way is probably
> to
> >> >> attach
> >> >> > that to a jira.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jun
> >> >> >
> >> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <
> >> esmith@stardotstar.org
> >> >> >wrote:
> >> >> >
> >> >> >> I didn't want to open up a bug unless there was some concurrence
> on
> >> >> >> this.   Please review the change below and see if I'm just
> >> >> >> misunderstanding things or not.  This paragraph in the doc took
> me a
> >> >> >> long time to digest because it was describing the contrib/hadoop
> >> >> >> consumer and not how simpleconsumer or consoleconsumer work:
> >> >> >>
> >> >> >> Consumer State (the second heading like this in the file)
> >> >> >>
> >> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> >> information on what has been consumed.  The core Kafka consumers
> >> write
> >> >> >> their state data to zookeeper.
> >> >> >>
> >> >> >> However, it may be beneficial for consumers to write state data
> into
> >> >> >> the same datastore where they are writing the results of their
> >> >> >> processing.  For example, the consumer may simply be entering some
> >> >> >> aggregate value into a centralized...... (rest of section remains
> the
> >> >> >> same from here)
> >> >> >>
> >> >> >> Ed
> >> >> >>
> >> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com>
> wrote:
> >> >> >> > Currently, as you are iterating messages returned by
> >> SimpleConsumer,
> >> >> you
> >> >> >> > also get the offset for the next message. In the map, you can
> just
> >> run
> >> >> >> for
> >> >> >> > 30 mins and save the next offset for the next run.
> >> >> >> >
> >> >> >> > Thanks,
> >> >> >> >
> >> >> >> > Jun
> >> >> >> >
> >> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com>
> >> wrote:
> >> >> >> >
> >> >> >> >> Hi ,
> >> >> >> >>
> >> >> >> >> I looked at hadoop-consumer , which fetches data directly from
> the
> >> >> kafka
> >> >> >> >> broker . But from what i understand it is based on min and max
> >> offset
> >> >> >> and
> >> >> >> >> map task complete once they reach the maximum offset for a
> given
> >> >> topic .
> >> >> >> >>
> >> >> >> >> In our use case we would not know about the max offset before
> >> hand.
> >> >> >> Instead
> >> >> >> >> we want map to keep reading data from a min offset and roll
> over
> >> >> every
> >> >> >> 30
> >> >> >> >> mins . At 30th min we would again generate the offsets which
> >> would be
> >> >> >> used
> >> >> >> >> for the next run.
> >> >> >> >>
> >> >> >> >> any suggestions would be helpful .
> >> >> >> >>
> >> >> >> >> regards,
> >> >> >> >> rks
> >> >> >> >>
> >> >> >>
> >> >>
> >>
>

Re: Consumer State Description in design.html

Posted by Edward Smith <es...@stardotstar.org>.

Ack!  No!  I'm sorry, I'm probably just confusing the issue.  I just
want to clarify the docs, not change the functionality.

Maybe I'll try to sum it up the way I would write the jira:

"Design.html is confusing to new users when it comes to where offset
data is stored by consumers."


On Fri, Apr 13, 2012 at 2:48 PM, Jun Rao <ju...@gmail.com> wrote:
> Ed,
>
> It seems that you are proposing a pluggable consumer offset store. We don't
> have that now. Could you open a jira for that?
>
> Thanks,
>
> Jun
>
> On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <es...@stardotstar.org>wrote:
>
>> Jun,
>>
>> Let me try to rephrase to see if I can get this point across more clearly.
>>
>> I've been exploring the design by running the console tools.  The
>> console consumer stores offset data in ZK.   This appears to be the
>> default behavior in a Kafka deployment.  For example, if you skip down
>> to "Consumers and Consumer Groups", it says that offsets are stored in
>> ZK.
>>
>> This paragraph that I want to change, is basically describing an
>> alternative technique of tracking offsets.  It has been confusing to
>> me as I've tried to understand the design of Kafka, so I want to see
>> if we can clarify it somehow.
>>
>> Ed
>>
>> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <ju...@gmail.com> wrote:
>> > Ed,
>> >
>> > The design page only describes how the high level consumer (which most
>> > people use) works. The high level consumer currently doesn't expose
>> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is
>> not
>> > described. We can have a wiki describing it and put your content there.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <esmith@stardotstar.org
>> >wrote:
>> >
>> >> Sorry.... here it is with more clarity:
>> >>
>> >> Basically I'm adding to the beginning of the 2nd section titled
>> "Consumer
>> >> State"
>> >>
>> >> ----------------------------------------
>> >> <h3>Consumer State</h3> (the second heading like this in the file)
>> >> <p>
>> >> In Kafka, the consumers are responsible for maintaining state
>> >> information on what has been  consumed.  The core Kafka consumers
>> >> write their state data to zookeeper.
>> >> </p>
>> >> <p>
>> >> However, it may be beneficial for consumers to write state data into
>> >> the same datastore where they are writing the results of their
>> >> processing.  For example, the consumer may simply be entering some
>> >> aggregate value into a centralized......
>> >> ..
>> >> (rest of section remains the same from here)
>> >> ..
>> >> </p>
>> >> ------------------------------------------
>> >>
>> >>
>> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <ju...@gmail.com> wrote:
>> >> > Ed,
>> >> >
>> >> > I don't see the change you want to make. Apache mailing list doesn't
>> take
>> >> > attachments. If you have attachments, the easiest way is probably to
>> >> attach
>> >> > that to a jira.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <
>> esmith@stardotstar.org
>> >> >wrote:
>> >> >
>> >> >> I didn't want to open up a bug unless there was some concurrence on
>> >> >> this.   Please review the change below and see if I'm just
>> >> >> misunderstanding things or not.  This paragraph in the doc took me a
>> >> >> long time to digest because it was describing the contrib/hadoop
>> >> >> consumer and not how simpleconsumer or consoleconsumer work:
>> >> >>
>> >> >> Consumer State (the second heading like this in the file)
>> >> >>
>> >> >> In Kafka, the consumers are responsible for maintaining state
>> >> >> information on what has been consumed.  The core Kafka consumers
>> write
>> >> >> their state data to zookeeper.
>> >> >>
>> >> >> However, it may be beneficial for consumers to write state data into
>> >> >> the same datastore where they are writing the results of their
>> >> >> processing.  For example, the consumer may simply be entering some
>> >> >> aggregate value into a centralized...... (rest of section remains the
>> >> >> same from here)
>> >> >>
>> >> >> Ed
>> >> >>
>> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com> wrote:
>> >> >> > Currently, as you are iterating messages returned by
>> SimpleConsumer,
>> >> you
>> >> >> > also get the offset for the next message. In the map, you can just
>> run
>> >> >> for
>> >> >> > 30 mins and save the next offset for the next run.
>> >> >> >
>> >> >> > Thanks,
>> >> >> >
>> >> >> > Jun
>> >> >> >
>> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com>
>> wrote:
>> >> >> >
>> >> >> >> Hi ,
>> >> >> >>
>> >> >> >> I looked at hadoop-consumer , which fetches data directly from the
>> >> kafka
>> >> >> >> broker . But from what i understand it is based on min and max
>> offset
>> >> >> and
>> >> >> >> map task complete once they reach the maximum offset for a given
>> >> topic .
>> >> >> >>
>> >> >> >> In our use case we would not know about the max offset before
>> hand.
>> >> >> Instead
>> >> >> >> we want map to keep reading data from a min offset and roll over
>> >> every
>> >> >> 30
>> >> >> >> mins . At 30th min we would again generate the offsets which
>> would be
>> >> >> used
>> >> >> >> for the next run.
>> >> >> >>
>> >> >> >> any suggestions would be helpful .
>> >> >> >>
>> >> >> >> regards,
>> >> >> >> rks
>> >> >> >>
>> >> >>
>> >>
>>

Re: Consumer State Description in design.html

Posted by Jun Rao <ju...@gmail.com>.

Ed,

It seems that you are proposing a pluggable consumer offset store. We don't
have that now. Could you open a jira for that?

Thanks,

Jun

On Fri, Apr 13, 2012 at 11:27 AM, Edward Smith <es...@stardotstar.org>wrote:

> Jun,
>
> Let me try to rephrase to see if I can get this point across more clearly.
>
> I've been exploring the design by running the console tools.  The
> console consumer stores offset data in ZK.   This appears to be the
> default behavior in a Kafka deployment.  For example, if you skip down
> to "Consumers and Consumer Groups", it says that offsets are stored in
> ZK.
>
> This paragraph that I want to change, is basically describing an
> alternative technique of tracking offsets.  It has been confusing to
> me as I've tried to understand the design of Kafka, so I want to see
> if we can clarify it somehow.
>
> Ed
>
> On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <ju...@gmail.com> wrote:
> > Ed,
> >
> > The design page only describes how the high level consumer (which most
> > people use) works. The high level consumer currently doesn't expose
> > offsets. Hadoop uses the low level consumer (SimpleConsumer), which is
> not
> > described. We can have a wiki describing it and put your content there.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <esmith@stardotstar.org
> >wrote:
> >
> >> Sorry.... here it is with more clarity:
> >>
> >> Basically I'm adding to the beginning of the 2nd section titled
> "Consumer
> >> State"
> >>
> >> ----------------------------------------
> >> <h3>Consumer State</h3> (the second heading like this in the file)
> >> <p>
> >> In Kafka, the consumers are responsible for maintaining state
> >> information on what has been  consumed.  The core Kafka consumers
> >> write their state data to zookeeper.
> >> </p>
> >> <p>
> >> However, it may be beneficial for consumers to write state data into
> >> the same datastore where they are writing the results of their
> >> processing.  For example, the consumer may simply be entering some
> >> aggregate value into a centralized......
> >> ..
> >> (rest of section remains the same from here)
> >> ..
> >> </p>
> >> ------------------------------------------
> >>
> >>
> >> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <ju...@gmail.com> wrote:
> >> > Ed,
> >> >
> >> > I don't see the change you want to make. Apache mailing list doesn't
> take
> >> > attachments. If you have attachments, the easiest way is probably to
> >> attach
> >> > that to a jira.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <
> esmith@stardotstar.org
> >> >wrote:
> >> >
> >> >> I didn't want to open up a bug unless there was some concurrence on
> >> >> this.   Please review the change below and see if I'm just
> >> >> misunderstanding things or not.  This paragraph in the doc took me a
> >> >> long time to digest because it was describing the contrib/hadoop
> >> >> consumer and not how simpleconsumer or consoleconsumer work:
> >> >>
> >> >> Consumer State (the second heading like this in the file)
> >> >>
> >> >> In Kafka, the consumers are responsible for maintaining state
> >> >> information on what has been consumed.  The core Kafka consumers
> write
> >> >> their state data to zookeeper.
> >> >>
> >> >> However, it may be beneficial for consumers to write state data into
> >> >> the same datastore where they are writing the results of their
> >> >> processing.  For example, the consumer may simply be entering some
> >> >> aggregate value into a centralized...... (rest of section remains the
> >> >> same from here)
> >> >>
> >> >> Ed
> >> >>
> >> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com> wrote:
> >> >> > Currently, as you are iterating messages returned by
> SimpleConsumer,
> >> you
> >> >> > also get the offset for the next message. In the map, you can just
> run
> >> >> for
> >> >> > 30 mins and save the next offset for the next run.
> >> >> >
> >> >> > Thanks,
> >> >> >
> >> >> > Jun
> >> >> >
> >> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com>
> wrote:
> >> >> >
> >> >> >> Hi ,
> >> >> >>
> >> >> >> I looked at hadoop-consumer , which fetches data directly from the
> >> kafka
> >> >> >> broker . But from what i understand it is based on min and max
> offset
> >> >> and
> >> >> >> map task complete once they reach the maximum offset for a given
> >> topic .
> >> >> >>
> >> >> >> In our use case we would not know about the max offset before
> hand.
> >> >> Instead
> >> >> >> we want map to keep reading data from a min offset and roll over
> >> every
> >> >> 30
> >> >> >> mins . At 30th min we would again generate the offsets which
> would be
> >> >> used
> >> >> >> for the next run.
> >> >> >>
> >> >> >> any suggestions would be helpful .
> >> >> >>
> >> >> >> regards,
> >> >> >> rks
> >> >> >>
> >> >>
> >>
>

Re: Consumer State Description in design.html

Posted by Edward Smith <es...@stardotstar.org>.

Jun,

Let me try to rephrase to see if I can get this point across more clearly.

I've been exploring the design by running the console tools.  The
console consumer stores offset data in ZK.   This appears to be the
default behavior in a Kafka deployment.  For example, if you skip down
to "Consumers and Consumer Groups", it says that offsets are stored in
ZK.

This paragraph that I want to change, is basically describing an
alternative technique of tracking offsets.  It has been confusing to
me as I've tried to understand the design of Kafka, so I want to see
if we can clarify it somehow.

Ed

On Fri, Apr 13, 2012 at 1:39 PM, Jun Rao <ju...@gmail.com> wrote:
> Ed,
>
> The design page only describes how the high level consumer (which most
> people use) works. The high level consumer currently doesn't expose
> offsets. Hadoop uses the low level consumer (SimpleConsumer), which is not
> described. We can have a wiki describing it and put your content there.
>
> Thanks,
>
> Jun
>
> On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <es...@stardotstar.org>wrote:
>
>> Sorry.... here it is with more clarity:
>>
>> Basically I'm adding to the beginning of the 2nd section titled "Consumer
>> State"
>>
>> ----------------------------------------
>> <h3>Consumer State</h3> (the second heading like this in the file)
>> <p>
>> In Kafka, the consumers are responsible for maintaining state
>> information on what has been  consumed.  The core Kafka consumers
>> write their state data to zookeeper.
>> </p>
>> <p>
>> However, it may be beneficial for consumers to write state data into
>> the same datastore where they are writing the results of their
>> processing.  For example, the consumer may simply be entering some
>> aggregate value into a centralized......
>> ..
>> (rest of section remains the same from here)
>> ..
>> </p>
>> ------------------------------------------
>>
>>
>> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <ju...@gmail.com> wrote:
>> > Ed,
>> >
>> > I don't see the change you want to make. Apache mailing list doesn't take
>> > attachments. If you have attachments, the easiest way is probably to
>> attach
>> > that to a jira.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <esmith@stardotstar.org
>> >wrote:
>> >
>> >> I didn't want to open up a bug unless there was some concurrence on
>> >> this.   Please review the change below and see if I'm just
>> >> misunderstanding things or not.  This paragraph in the doc took me a
>> >> long time to digest because it was describing the contrib/hadoop
>> >> consumer and not how simpleconsumer or consoleconsumer work:
>> >>
>> >> Consumer State (the second heading like this in the file)
>> >>
>> >> In Kafka, the consumers are responsible for maintaining state
>> >> information on what has been consumed.  The core Kafka consumers write
>> >> their state data to zookeeper.
>> >>
>> >> However, it may be beneficial for consumers to write state data into
>> >> the same datastore where they are writing the results of their
>> >> processing.  For example, the consumer may simply be entering some
>> >> aggregate value into a centralized...... (rest of section remains the
>> >> same from here)
>> >>
>> >> Ed
>> >>
>> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com> wrote:
>> >> > Currently, as you are iterating messages returned by SimpleConsumer,
>> you
>> >> > also get the offset for the next message. In the map, you can just run
>> >> for
>> >> > 30 mins and save the next offset for the next run.
>> >> >
>> >> > Thanks,
>> >> >
>> >> > Jun
>> >> >
>> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com> wrote:
>> >> >
>> >> >> Hi ,
>> >> >>
>> >> >> I looked at hadoop-consumer , which fetches data directly from the
>> kafka
>> >> >> broker . But from what i understand it is based on min and max offset
>> >> and
>> >> >> map task complete once they reach the maximum offset for a given
>> topic .
>> >> >>
>> >> >> In our use case we would not know about the max offset before hand.
>> >> Instead
>> >> >> we want map to keep reading data from a min offset and roll over
>> every
>> >> 30
>> >> >> mins . At 30th min we would again generate the offsets which would be
>> >> used
>> >> >> for the next run.
>> >> >>
>> >> >> any suggestions would be helpful .
>> >> >>
>> >> >> regards,
>> >> >> rks
>> >> >>
>> >>
>>

Re: Consumer State Description in design.html

Posted by Jun Rao <ju...@gmail.com>.

Ed,

The design page only describes how the high level consumer (which most
people use) works. The high level consumer currently doesn't expose
offsets. Hadoop uses the low level consumer (SimpleConsumer), which is not
described. We can have a wiki describing it and put your content there.

Thanks,

Jun

On Fri, Apr 13, 2012 at 10:24 AM, Edward Smith <es...@stardotstar.org>wrote:

> Sorry.... here it is with more clarity:
>
> Basically I'm adding to the beginning of the 2nd section titled "Consumer
> State"
>
> ----------------------------------------
> <h3>Consumer State</h3> (the second heading like this in the file)
> <p>
> In Kafka, the consumers are responsible for maintaining state
> information on what has been  consumed.  The core Kafka consumers
> write their state data to zookeeper.
> </p>
> <p>
> However, it may be beneficial for consumers to write state data into
> the same datastore where they are writing the results of their
> processing.  For example, the consumer may simply be entering some
> aggregate value into a centralized......
> ..
> (rest of section remains the same from here)
> ..
> </p>
> ------------------------------------------
>
>
> On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <ju...@gmail.com> wrote:
> > Ed,
> >
> > I don't see the change you want to make. Apache mailing list doesn't take
> > attachments. If you have attachments, the easiest way is probably to
> attach
> > that to a jira.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <esmith@stardotstar.org
> >wrote:
> >
> >> I didn't want to open up a bug unless there was some concurrence on
> >> this.   Please review the change below and see if I'm just
> >> misunderstanding things or not.  This paragraph in the doc took me a
> >> long time to digest because it was describing the contrib/hadoop
> >> consumer and not how simpleconsumer or consoleconsumer work:
> >>
> >> Consumer State (the second heading like this in the file)
> >>
> >> In Kafka, the consumers are responsible for maintaining state
> >> information on what has been consumed.  The core Kafka consumers write
> >> their state data to zookeeper.
> >>
> >> However, it may be beneficial for consumers to write state data into
> >> the same datastore where they are writing the results of their
> >> processing.  For example, the consumer may simply be entering some
> >> aggregate value into a centralized...... (rest of section remains the
> >> same from here)
> >>
> >> Ed
> >>
> >> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com> wrote:
> >> > Currently, as you are iterating messages returned by SimpleConsumer,
> you
> >> > also get the offset for the next message. In the map, you can just run
> >> for
> >> > 30 mins and save the next offset for the next run.
> >> >
> >> > Thanks,
> >> >
> >> > Jun
> >> >
> >> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com> wrote:
> >> >
> >> >> Hi ,
> >> >>
> >> >> I looked at hadoop-consumer , which fetches data directly from the
> kafka
> >> >> broker . But from what i understand it is based on min and max offset
> >> and
> >> >> map task complete once they reach the maximum offset for a given
> topic .
> >> >>
> >> >> In our use case we would not know about the max offset before hand.
> >> Instead
> >> >> we want map to keep reading data from a min offset and roll over
> every
> >> 30
> >> >> mins . At 30th min we would again generate the offsets which would be
> >> used
> >> >> for the next run.
> >> >>
> >> >> any suggestions would be helpful .
> >> >>
> >> >> regards,
> >> >> rks
> >> >>
> >>
>

Re: Consumer State Description in design.html

Posted by Edward Smith <es...@stardotstar.org>.

Sorry.... here it is with more clarity:

Basically I'm adding to the beginning of the 2nd section titled "Consumer State"

----------------------------------------
<h3>Consumer State</h3> (the second heading like this in the file)
<p>
In Kafka, the consumers are responsible for maintaining state
information on what has been  consumed.  The core Kafka consumers
write their state data to zookeeper.
</p>
<p>
However, it may be beneficial for consumers to write state data into
the same datastore where they are writing the results of their
processing.  For example, the consumer may simply be entering some
aggregate value into a centralized......
..
(rest of section remains the same from here)
..
</p>
------------------------------------------


On Fri, Apr 13, 2012 at 1:16 PM, Jun Rao <ju...@gmail.com> wrote:
> Ed,
>
> I don't see the change you want to make. Apache mailing list doesn't take
> attachments. If you have attachments, the easiest way is probably to attach
> that to a jira.
>
> Thanks,
>
> Jun
>
> On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <es...@stardotstar.org>wrote:
>
>> I didn't want to open up a bug unless there was some concurrence on
>> this.   Please review the change below and see if I'm just
>> misunderstanding things or not.  This paragraph in the doc took me a
>> long time to digest because it was describing the contrib/hadoop
>> consumer and not how simpleconsumer or consoleconsumer work:
>>
>> Consumer State (the second heading like this in the file)
>>
>> In Kafka, the consumers are responsible for maintaining state
>> information on what has been consumed.  The core Kafka consumers write
>> their state data to zookeeper.
>>
>> However, it may be beneficial for consumers to write state data into
>> the same datastore where they are writing the results of their
>> processing.  For example, the consumer may simply be entering some
>> aggregate value into a centralized...... (rest of section remains the
>> same from here)
>>
>> Ed
>>
>> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com> wrote:
>> > Currently, as you are iterating messages returned by SimpleConsumer, you
>> > also get the offset for the next message. In the map, you can just run
>> for
>> > 30 mins and save the next offset for the next run.
>> >
>> > Thanks,
>> >
>> > Jun
>> >
>> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com> wrote:
>> >
>> >> Hi ,
>> >>
>> >> I looked at hadoop-consumer , which fetches data directly from the kafka
>> >> broker . But from what i understand it is based on min and max offset
>> and
>> >> map task complete once they reach the maximum offset for a given topic .
>> >>
>> >> In our use case we would not know about the max offset before hand.
>> Instead
>> >> we want map to keep reading data from a min offset and roll over every
>> 30
>> >> mins . At 30th min we would again generate the offsets which would be
>> used
>> >> for the next run.
>> >>
>> >> any suggestions would be helpful .
>> >>
>> >> regards,
>> >> rks
>> >>
>>

Re: Consumer State Description in design.html

Posted by Jun Rao <ju...@gmail.com>.

Ed,

I don't see the change you want to make. Apache mailing list doesn't take
attachments. If you have attachments, the easiest way is probably to attach
that to a jira.

Thanks,

Jun

On Fri, Apr 13, 2012 at 10:04 AM, Edward Smith <es...@stardotstar.org>wrote:

> I didn't want to open up a bug unless there was some concurrence on
> this.   Please review the change below and see if I'm just
> misunderstanding things or not.  This paragraph in the doc took me a
> long time to digest because it was describing the contrib/hadoop
> consumer and not how simpleconsumer or consoleconsumer work:
>
> Consumer State (the second heading like this in the file)
>
> In Kafka, the consumers are responsible for maintaining state
> information on what has been consumed.  The core Kafka consumers write
> their state data to zookeeper.
>
> However, it may be beneficial for consumers to write state data into
> the same datastore where they are writing the results of their
> processing.  For example, the consumer may simply be entering some
> aggregate value into a centralized...... (rest of section remains the
> same from here)
>
> Ed
>
> On Fri, Apr 13, 2012 at 12:02 PM, Jun Rao <ju...@gmail.com> wrote:
> > Currently, as you are iterating messages returned by SimpleConsumer, you
> > also get the offset for the next message. In the map, you can just run
> for
> > 30 mins and save the next offset for the next run.
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Apr 13, 2012 at 1:01 AM, R S <my...@gmail.com> wrote:
> >
> >> Hi ,
> >>
> >> I looked at hadoop-consumer , which fetches data directly from the kafka
> >> broker . But from what i understand it is based on min and max offset
> and
> >> map task complete once they reach the maximum offset for a given topic .
> >>
> >> In our use case we would not know about the max offset before hand.
> Instead
> >> we want map to keep reading data from a min offset and roll over every
> 30
> >> mins . At 30th min we would again generate the offsets which would be
> used
> >> for the next run.
> >>
> >> any suggestions would be helpful .
> >>
> >> regards,
> >> rks
> >>
>