You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Nilesh Chhapru <ni...@ugamsolutions.com> on 2014/10/28 07:15:39 UTC
Issues with storm kafka integration
Hi Nathan,
We are using storm Kafka integration where a Spout reads from a Kafka topic.
Following is the version of storm, Kafka and zookeeper we are using.
Strom : apache-storm-0.9.2-incubating
Kafka : kafka_2.8.0-0.8.1.1
Zookeeper : zookeeper-3.4.6
I am facing following issues at spout.
1) The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren't getting any exceptions at any of the bolt.
2) A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
3) The consumer group is isn't working properly for storm Kafka integration.
a. When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b. If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn't if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
Kindly help me with above points as it is hampering the overall scope of the project and also time lines.
Do call or email in-case you need any other information.
Nilesh Chhapru,
*: +91 9619030491
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Re: Issues with storm kafka integration
Posted by "P. Taylor Goetz" <pt...@gmail.com>.
Nilesh,
Can you post the code you are using to setup the Kafka spout instances? Also your storm configuration would help as well.
-Taylor
On Oct 31, 2014, at 4:04 AM, Nilesh Chhapru <ni...@ugamsolutions.com> wrote:
> Hi All,
>
> Any update on below email, really stuck with this big time now L
>
> Regards,
> Nilesh Chhapru.
>
> From: Nilesh Chhapru [mailto:nilesh.chhapru@ugamsolutions.com]
> Sent: 28 October 2014 11:46 AM
> To: nathan@nathanmarz.com; nathan.marz@gmail.com; user@storm.apache.org
> Subject: Issues with storm kafka integration
>
> Hi Nathan,
>
> We are using storm Kafka integration where a Spout reads from a Kafka topic.
>
> Following is the version of storm, Kafka and zookeeper we are using.
> Strom : apache-storm-0.9.2-incubating
> Kafka : kafka_2.8.0-0.8.1.1
> Zookeeper : zookeeper-3.4.6
>
> I am facing following issues at spout.
> 1) The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren’t getting any exceptions at any of the bolt.
> 2) A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
> 3) The consumer group is isn’t working properly for storm Kafka integration.
> a. When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
> b. If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
>
> Kindly help me with above points as it is hampering the overall scope of the project and also time lines.
>
> Do call or email in-case you need any other information.
>
>
> Nilesh Chhapru,
> (: +91 9619030491
>
>
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
> ****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
>
> Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
>
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
> ****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
>
> Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
RE: Issues with storm kafka integration
Posted by Nilesh Chhapru <ni...@ugamsolutions.com>.
Hi All,
Any update on below email, really stuck with this big time now :(
Regards,
Nilesh Chhapru.
From: Nilesh Chhapru [mailto:nilesh.chhapru@ugamsolutions.com]
Sent: 28 October 2014 11:46 AM
To: nathan@nathanmarz.com; nathan.marz@gmail.com; user@storm.apache.org
Subject: Issues with storm kafka integration
Hi Nathan,
We are using storm Kafka integration where a Spout reads from a Kafka topic.
Following is the version of storm, Kafka and zookeeper we are using.
Strom : apache-storm-0.9.2-incubating
Kafka : kafka_2.8.0-0.8.1.1
Zookeeper : zookeeper-3.4.6
I am facing following issues at spout.
1) The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren't getting any exceptions at any of the bolt.
2) A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
3) The consumer group is isn't working properly for storm Kafka integration.
a. When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b. If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn't if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
Kindly help me with above points as it is hampering the overall scope of the project and also time lines.
Do call or email in-case you need any other information.
Nilesh Chhapru,
*: +91 9619030491
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Re: Issues Storm - Kafka
Posted by Harsha <st...@harsha.io>.
Nilesh, That is what I am saying if you use same consumer group for both
the topologies it will pick up the offset from T1 topology. if you want
to separate the reading of T1 and T2 from the same topic use different
consumer groups. -Harsha
On Thu, Nov 6, 2014, at 10:28 PM, Nilesh Chhapru wrote:
> Harsha,
>
> I wanted following scenario, which isn’t working if I use simple
> consumer.
>
> 1)Topology-1(T1) reads a kafka topic(KT1).
> 2)Topology-2(T2) reads the same topic (KT1)
>
> But both the topologies are not deployed at the same time, when T1 is
> deployed it starts reading the KT1, but by that time if I bring up T2
> it doesn’t read
from start or where it left last time as the offset is changed for that
since T1 was reading the topic, I know we can set the offset for T2
spout to read from beginning but can’t do that as it will not read
every time from start, it has to start from where it ends last time
when un-deployed.
>
> I read through many blogs which says that zookeeper saves the offset
> with consumer group id hence wanted to include the same.
>
> *Regards*,
> *Nilesh Chhapru.*
>
> *From:* Harsha [mailto:storm@harsha.io]
>
> *Sent:* 07 November 2014 12:13 AM *To:* user@storm.apache.org
> *Subject:* Re: Issues Storm - Kafka
>
> Nilesh,
> You can safely run two topologies reading from same topic twice. You
> can use kafka spout from storm to achieve this.
> If you are using single consumer group in two topologies you are
> distributing the data into two topologies and it doesn't read the same
> topic twice. If you want to use consumer group you need to give unique
> names for the two topologies.
>
> please read the following doc
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> "The Consumer Group name is global across a Kafka cluster, so you
> should be careful that any 'old' logic Consumers be shutdown before
> starting new code. When a new process is started with the same
> Consumer Group name, Kafka will add that
processes' threads to the set of threads available to consume the Topic
and trigger a 're-balance'. During this re-balance Kafka will assign
available partitions to available threads, possibly moving a partition
to another process. If you have a mixture of old and new business
logic, it is possible that some messages go to the old logic."
>
> From you use case I don't see why you can't use KafkaSpout from
> storm. You can use multiple topologies reading from same topic or
> use multiple bolts in a single topology to do different operations
> on a tuple.
> -Harsha
>
>
>
>
> On Thu, Nov 6, 2014, at 10:14 AM, Nilesh Chhapru wrote:
>> Hi Harsha,
>>
>> I wanted to broadcast one message to two consumer that is spouts in
>> two topology, for which I read about consumer
group in kafka docs, but this isn’t supported by the simple consumer
provided by storm kafka.
>>
>> Hence had to move to a high level consumer API, but a bit doubtful as
>> some of the blogs says that it do a batch offset
commit, do you have more details on this, or are you using high level
api in any of you applications.
>>
>> Also is there a way to broadcast a message from kafka using simple
>> consumer provided by storm kafka integration.
>>
>>
>> *Regards*,
>> *Nilesh Chhapru.*
>>
>> *From:* Harsha [mailto:storm@harsha.io]
>>
>> *Sent:* 06 November 2014 09:57 PM *To:* user@storm.apache.org
>> *Subject:* Re: Issues Storm - Kafka
>>
>> Nilesh,
>> I thought you are using
>> https://github.com/apache/storm/tree/master/external/storm-kafka. Any
>> reason for you to use the kafkaSpout
with consumer group support?
>> It handles the replays based on ack or fail. The linked KafkaSpout
>> uses simpleApi which allows it go back n forth in the kafka queue
>> which is not part of high-level consumer api ( this is the api where
>> consumer groups
are supported).
>> If you have two topologies and doing different operations and you are
>> using consumer group than you should use different consumer group. If
>> you are using single consumer group , data from kafka queue will be
>> distributed
to two topologies. So each topology gets part of the data.
>> My suggestion would be to use above kafkaspout If the only reason you
>> are using https://github.com/HolmesNL/kafka-spout is for consumer
>> groups.
>>
>> Here is a link to kafka higher level api
>> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
>> "Why use the High Level Consumer
>> Sometimes the logic to read messages from Kafka doesn't care about
>> handling the message offsets, it just wants the data. So the High
>> Level Consumer is provided to abstract most of the details of
>> consuming events from Kafka."
>> With storm you want control over handling the message offsets. If a
>> message failed in a downstream bolt you want roll back the offset and
>> replay the tuple from kafka. With higher level api you won't be able
>> todo that.
>>
>> -Harsha
>>
>>
>>
>> On Thu, Nov 6, 2014, at 07:26 AM, Nilesh Chhapru wrote:
>>> Hi Harsha / Shamsul,
>>>
>>> Thanks for your inputs.
>>> I am using BasicBasicBolt so it call the ack method automatically
>>> hence now explicitly doing the same.
>>>
>>> Moreover for consumer group I have now moved KafkaSpout to
>>> https://github.com/HolmesNL/kafka-spout for getting the consumer
>>> group id, let me know if you have used this anytime.
>>>
>>> I don’t need 2 consumer to coordinate but we have 2 topologies
>>> listening to one kafka topic and doing different operations
on the same live saving to database and passing it to validator.
>>>
>>> Do email in-case you need any other information.
>>>
>>> *Regards*,
>>> *Nilesh Chhapru.*
>>>
>>> *From:* Harsha [mailto:storm@harsha.io]
>>>
>>> *Sent:* 06 November 2014 08:36 PM *To:* user@storm.apache.org
>>> *Subject:* Re: Issues Storm - Kafka
>>>
>>> NIlesh and Shamsul,
>>> 2) you don't need to use another database to keep track processed
>>> tuples. Are you sure you are doing tuple ack and fail in the
>>> downstream bolts so that kafkaspout knows it processed the tuple.
>>> Tuple replays
occurs if there are timeouts happening or incase of exceptions where
you call fail on a tuple.
>>>
>>>>> 3)The consumer group is isn’t working properly for storm Kafka
>>>>> integration.
>>>>> a.When we give same group id to the Kafka consumer of different
>>>>> topology but still both are reading same messages.
>>>>> b.If we have 2 different consumer with different consumer group id
>>>>> in different topology it works fine if both topologies are
>>>>> deployed at the same time, but doesn’t if we deploy one of them
>>>>> after some of the message are already loaded in the topic and
>>>>> read
by the first topology.
>>> a. Kafka Spout uses simple consumer api it doesn't need a consumer
>>> group. can you give us more details why you need two topologies to
>>> use coordinate? (i.e use the same consumer group).
>>> Thanks,
>>> Harsha
>>>
>>>
>>> On Thu, Nov 6, 2014, at 04:27 AM, Shamsul Haque wrote:
>>>> Hi Nilesh,
>>>>
>>>> For point 1, try by increasing the 'topology.message.timeout.secs'
>>>> to 10 to 15 mins or more then slowly decrease it which suits your
>>>> topology. For me that worked for the same case.
>>>> For point 2, we have used database to made track what we have
>>>> processed, so don't process the same tuple again.
>>>>
>>>> regards
>>>> Shams
>>>>
>>>> On Thursday 06 November 2014 12:16 PM, Nilesh Chhapru wrote:
>>>>> Hi All,
>>>>>
>>>>> We are using storm Kafka integration where a Spout reads from a
>>>>> Kafka topic.
>>>>>
>>>>> Following is the version of storm, Kafka and zookeeper we are
>>>>> using.
>>>>> *Strom : apache-storm-0.9.2-incubating*
>>>>> *Kafka : kafka_2.8.0-0.8.1.1*
>>>>> *Zookeeper : zookeeper-3.4.6*
>>>>>
>>>>> I am facing following issues at spout.
>>>>> 1)The messages gets failed even if the average time taken is less
>>>>> than max.topology.timeout value, also we aren’t getting any
>>>>> exceptions at any of the bolt.
>>>>> 2)A topology is finally emitting to the Kafka producer i.e. some
>>>>> other topic, but the messages are getting duplicated due to
>>>>> replay issues.
>>>>> 3)The consumer group is isn’t working properly for storm Kafka
>>>>> integration.
>>>>> a.When we give same group id to the Kafka consumer of different
>>>>> topology but still both are reading same messages.
>>>>> b.If we have 2 different consumer with different consumer group id
>>>>> in different topology it works fine if both topologies are
>>>>> deployed at the same time, but doesn’t if we deploy
one of them after some of the message are already loaded in the topic
and read by the first topology.
>>>>>
>>>>> Kindly help me with above points as it is hampering the overall
>>>>> scope of the project and also time lines.
>>>>>
>>>>> Do call or email in-case you need any other information.
>>>>>
>>>>>
>>>>> *Nilesh Chhapru,*
>>>>> (: +91 9619030491
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>>>>>
>>>>> ****Opinions expressed in this e-mail are those of the author and
>>>>> do not necessarily represent those of Ugam. Ugam does
not accept any responsibility or liability for it. This e-mail message
may contain proprietary, confidential or legally privileged information
for the sole use of the person or entity to whom this message was
originally addressed. Any review, re-transmission, dissemination or
other use of or taking of any action in reliance upon this information
by persons or entities other than the intended recipient is prohibited.
If you have received this e-mail in error, please delete it and all
attachments from any servers, hard drives or any other media.
>>>>>
>>>>> Warning: Sufficient measures have been taken to scan any presence
>>>>> of viruses however the recipient should check this
email and any attachments for the presence of viruses. Ugam accepts no
liability for any damage caused by any virus transmitted by this
email. ****
>>>>
>>>> --
>>>>
>>>> Email had 1 attachment:
>>>> * india-com.jpg
>>>>
35k (image/jpeg)
>>>
>>>
>>>
>>>
>>> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>>>
>>> ****Opinions expressed in this e-mail are those of the author and do
>>> not necessarily represent those of Ugam. Ugam does
not accept any responsibility or liability for it. This e-mail message
may contain proprietary, confidential or legally privileged information
for the sole use of the person or entity to whom this message was
originally addressed. Any review, re-transmission, dissemination or
other use of or taking of any action in reliance upon this information
by persons or entities other than the intended recipient is prohibited.
If you have received this e-mail in error, please delete it and all
attachments from any servers, hard drives or any other media.
>>>
>>> Warning: Sufficient measures have been taken to scan any presence of
>>> viruses however the recipient should check this
email and any attachments for the presence of viruses. Ugam accepts no
liability for any damage caused by any virus transmitted by this
email. ****
>>> Email had 1 attachment:
>>> * image001.jpg
>>>
35k (image/jpeg)
>>
>>
>>
>> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>>
>> ****Opinions expressed in this e-mail are those of the author and do
>> not necessarily represent those of Ugam. Ugam does not accept any
>> responsibility
or liability for it. This e-mail message may contain proprietary,
confidential or legally privileged information for the sole use of the
person or entity to whom this message was originally addressed. Any
review, re-transmission, dissemination or other use of or taking of any
action in reliance upon this information by persons or entities other
than the intended recipient is prohibited. If you have received this
e-mail in error, please delete it and all attachments from any servers,
hard drives or any other media.
>>
>> Warning: Sufficient measures have been taken to scan any presence of
>> viruses however the recipient should check this email and any
>> attachments
for the presence of viruses. Ugam accepts no liability for any damage
caused by any virus transmitted by this email. ****
>> Email had 1 attachment:
>> * image001.jpg
>>
35k (image/jpeg)
>
>
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
>
****Opinions expressed in this e-mail are those of the author and do not
necessarily represent those of Ugam. Ugam does not accept any
responsibility or liability for it. This e-mail message may contain
proprietary, confidential or legally privileged information for the sole
use of the person or entity to whom this message was originally
addressed. Any review, re-transmission, dissemination or other use of or
taking of any action in reliance upon this information by persons or
entities other than the intended recipient is prohibited. If you have
received this e-mail in error, please delete it and all attachments from
any servers, hard drives or any other media.
>
>
Warning: Sufficient measures have been taken to scan any presence of
viruses however the recipient should check this email and any
attachments for the presence of viruses. Ugam accepts no liability for
any damage caused by any virus transmitted by this email.
****
> Email had 1 attachment:
> * image001.jpg 35k (image/jpeg)
RE: Issues Storm - Kafka
Posted by Nilesh Chhapru <ni...@ugamsolutions.com>.
Harsha,
I wanted following scenario, which isn’t working if I use simple consumer.
1) Topology-1(T1) reads a kafka topic(KT1).
2) Topology-2(T2) reads the same topic (KT1)
But both the topologies are not deployed at the same time, when T1 is deployed it starts reading the KT1, but by that time if I bring up T2 it doesn’t read from start or where it left last time as the offset is changed for that since T1 was reading the topic, I know we can set the offset for T2 spout to read from beginning but can’t do that as it will not read every time from start, it has to start from where it ends last time when un-deployed.
I read through many blogs which says that zookeeper saves the offset with consumer group id hence wanted to include the same.
Regards,
Nilesh Chhapru.
From: Harsha [mailto:storm@harsha.io]
Sent: 07 November 2014 12:13 AM
To: user@storm.apache.org
Subject: Re: Issues Storm - Kafka
Nilesh,
You can safely run two topologies reading from same topic twice. You can use kafka spout from storm to achieve this.
If you are using single consumer group in two topologies you are distributing the data into two topologies and it doesn't read the same topic twice. If you want to use consumer group you need to give unique names for the two topologies.
please read the following doc https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
"The Consumer Group name is global across a Kafka cluster, so you should be careful that any 'old' logic Consumers be shutdown before starting new code. When a new process is started with the same Consumer Group name, Kafka will add that processes' threads to the set of threads available to consume the Topic and trigger a 're-balance'. During this re-balance Kafka will assign available partitions to available threads, possibly moving a partition to another process. If you have a mixture of old and new business logic, it is possible that some messages go to the old logic."
From you use case I don't see why you can't use KafkaSpout from storm. You can use multiple topologies reading from same topic or use multiple bolts in a single topology to do different operations on a tuple.
-Harsha
On Thu, Nov 6, 2014, at 10:14 AM, Nilesh Chhapru wrote:
Hi Harsha,
I wanted to broadcast one message to two consumer that is spouts in two topology, for which I read about consumer group in kafka docs, but this isn’t supported by the simple consumer provided by storm kafka.
Hence had to move to a high level consumer API, but a bit doubtful as some of the blogs says that it do a batch offset commit, do you have more details on this, or are you using high level api in any of you applications.
Also is there a way to broadcast a message from kafka using simple consumer provided by storm kafka integration.
Regards,
Nilesh Chhapru.
From: Harsha [mailto:storm@harsha.io]
Sent: 06 November 2014 09:57 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Issues Storm - Kafka
Nilesh,
I thought you are using https://github.com/apache/storm/tree/master/external/storm-kafka. Any reason for you to use the kafkaSpout with consumer group support?
It handles the replays based on ack or fail. The linked KafkaSpout uses simpleApi which allows it go back n forth in the kafka queue which is not part of high-level consumer api ( this is the api where consumer groups are supported).
If you have two topologies and doing different operations and you are using consumer group than you should use different consumer group. If you are using single consumer group , data from kafka queue will be distributed to two topologies. So each topology gets part of the data.
My suggestion would be to use above kafkaspout If the only reason you are using https://github.com/HolmesNL/kafka-spout is for consumer groups.
Here is a link to kafka higher level api
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
"Why use the High Level Consumer
Sometimes the logic to read messages from Kafka doesn't care about handling the message offsets, it just wants the data. So the High Level Consumer is provided to abstract most of the details of consuming events from Kafka."
With storm you want control over handling the message offsets. If a message failed in a downstream bolt you want roll back the offset and replay the tuple from kafka. With higher level api you won't be able todo that.
-Harsha
On Thu, Nov 6, 2014, at 07:26 AM, Nilesh Chhapru wrote:
Hi Harsha / Shamsul,
Thanks for your inputs.
I am using BasicBasicBolt so it call the ack method automatically hence now explicitly doing the same.
Moreover for consumer group I have now moved KafkaSpout to https://github.com/HolmesNL/kafka-spout for getting the consumer group id, let me know if you have used this anytime.
I don’t need 2 consumer to coordinate but we have 2 topologies listening to one kafka topic and doing different operations on the same live saving to database and passing it to validator.
Do email in-case you need any other information.
Regards,
Nilesh Chhapru.
From: Harsha [mailto:storm@harsha.io]
Sent: 06 November 2014 08:36 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Issues Storm - Kafka
NIlesh and Shamsul,
2) you don't need to use another database to keep track processed tuples. Are you sure you are doing tuple ack and fail in the downstream bolts so that kafkaspout knows it processed the tuple. Tuple replays occurs if there are timeouts happening or incase of exceptions where you call fail on a tuple.
3)The consumer group is isn’t working properly for storm Kafka integration.
a.When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b.If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
a. Kafka Spout uses simple consumer api it doesn't need a consumer group. can you give us more details why you need two topologies to use coordinate? (i.e use the same consumer group).
Thanks,
Harsha
On Thu, Nov 6, 2014, at 04:27 AM, Shamsul Haque wrote:
Hi Nilesh,
For point 1, try by increasing the 'topology.message.timeout.secs' to 10 to 15 mins or more then slowly decrease it which suits your topology. For me that worked for the same case.
For point 2, we have used database to made track what we have processed, so don't process the same tuple again.
regards
Shams
On Thursday 06 November 2014 12:16 PM, Nilesh Chhapru wrote:
Hi All,
We are using storm Kafka integration where a Spout reads from a Kafka topic.
Following is the version of storm, Kafka and zookeeper we are using.
Strom : apache-storm-0.9.2-incubating
Kafka : kafka_2.8.0-0.8.1.1
Zookeeper : zookeeper-3.4.6
I am facing following issues at spout.
1)The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren’t getting any exceptions at any of the bolt.
2)A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
3)The consumer group is isn’t working properly for storm Kafka integration.
a.When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b.If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
Kindly help me with above points as it is hampering the overall scope of the project and also time lines.
Do call or email in-case you need any other information.
Nilesh Chhapru,
•: +91 9619030491
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
-- [cid:image001.jpg@01CFFA81.B7D839C0]
Email had 1 attachment:
* india-com.jpg
35k (image/jpeg)
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Email had 1 attachment:
* image001.jpg
35k (image/jpeg)
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Email had 1 attachment:
* image001.jpg
35k (image/jpeg)
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Re: Issues Storm - Kafka
Posted by Harsha <st...@harsha.io>.
Nilesh, You can safely run two topologies reading from same topic twice.
You can use kafka spout from storm to achieve this. If you are using
single consumer group in two topologies you are distributing the data
into two topologies and it doesn't read the same topic twice. If you
want to use consumer group you need to give unique names for the two
topologies.
please read the following doc
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
"The Consumer Group name is global across a Kafka cluster, so you should
be careful that any 'old' logic Consumers be shutdown before starting
new code. When a new process is started with the same Consumer Group
name, Kafka will add that processes' threads to the set of threads
available to consume the Topic and trigger a 're-balance'. During this
re-balance Kafka will assign available partitions to available threads,
possibly moving a partition to another process. If you have a mixture of
old and new business logic, it is possible that some messages go to the
old logic."
>From you use case I don't see why you can't use KafkaSpout from storm.
You can use multiple topologies reading from same topic or use
multiple bolts in a single topology to do different operations on a
tuple. -Harsha
On Thu, Nov 6, 2014, at 10:14 AM, Nilesh Chhapru wrote:
> Hi Harsha,
>
> I wanted to broadcast one message to two consumer that is spouts in
> two topology, for which I read about consumer group in kafka docs, but
> this isn’t supported
by the simple consumer provided by storm kafka.
>
> Hence had to move to a high level consumer API, but a bit doubtful as
> some of the blogs says that it do a batch offset commit, do you have
> more details on this,
or are you using high level api in any of you applications.
>
> Also is there a way to broadcast a message from kafka using simple
> consumer provided by storm kafka integration.
>
>
> *Regards*,
> *Nilesh Chhapru.*
>
> *From:* Harsha [mailto:storm@harsha.io]
>
> *Sent:* 06 November 2014 09:57 PM *To:* user@storm.apache.org
> *Subject:* Re: Issues Storm - Kafka
>
> Nilesh,
> I thought you are using
> https://github.com/apache/storm/tree/master/external/storm-kafka. Any
> reason for you to use the kafkaSpout with consumer group
support?
> It handles the replays based on ack or fail. The linked KafkaSpout
> uses simpleApi which allows it go back n forth in the kafka queue
> which is not part of high-level consumer api ( this is the api where
> consumer groups are supported).
> If you have two topologies and doing different operations and you are
> using consumer group than you should use different consumer group. If
> you are using single consumer group , data from kafka queue will be
> distributed to two topologies.
So each topology gets part of the data.
> My suggestion would be to use above kafkaspout If the only reason
> you are using https://github.com/HolmesNL/kafka-spout is for
> consumer groups.
>
> Here is a link to kafka higher level api
> https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
> "Why use the High Level Consumer
> Sometimes the logic to read messages from Kafka doesn't care about
> handling the message offsets, it just wants the data. So the High
> Level Consumer is provided to abstract most of the details of
> consuming events from Kafka."
> With storm you want control over handling the message offsets. If a
> message failed in a downstream bolt you want roll back the offset and
> replay the tuple from kafka. With higher level api you won't be able
> todo that.
>
> -Harsha
>
>
>
> On Thu, Nov 6, 2014, at 07:26 AM, Nilesh Chhapru wrote:
>> Hi Harsha / Shamsul,
>>
>> Thanks for your inputs.
>> I am using BasicBasicBolt so it call the ack method automatically
>> hence now explicitly doing the same.
>>
>> Moreover for consumer group I have now moved KafkaSpout to
>> https://github.com/HolmesNL/kafka-spout for getting the consumer
>> group id, let me know if you have used this anytime.
>>
>> I don’t need 2 consumer to coordinate but we have 2 topologies
>> listening to one kafka topic and doing different operations
on the same live saving to database and passing it to validator.
>>
>> Do email in-case you need any other information.
>>
>> *Regards*,
>> *Nilesh Chhapru.*
>>
>> *From:* Harsha [mailto:storm@harsha.io]
>>
>> *Sent:* 06 November 2014 08:36 PM *To:* user@storm.apache.org
>> *Subject:* Re: Issues Storm - Kafka
>>
>> NIlesh and Shamsul,
>> 2) you don't need to use another database to keep track processed
>> tuples. Are you sure you are doing tuple ack and fail in the
>> downstream bolts so that kafkaspout knows it processed the tuple.
>> Tuple replays
occurs if there are timeouts happening or incase of exceptions where
you call fail on a tuple.
>>
>>>> 3)The consumer group is isn’t working properly for storm Kafka
>>>> integration.
>>>> a.When we give same group id to the Kafka consumer of different
>>>> topology but still both are reading same messages.
>>>> b.If we have 2 different consumer with different consumer group id
>>>> in different topology it works fine if both topologies are
>>>> deployed at the same time, but doesn’t if we deploy one of them
>>>> after some of the message are already loaded in the topic and
>>>> read
by the first topology.
>> a. Kafka Spout uses simple consumer api it doesn't need a consumer
>> group. can you give us more details why you need two topologies to
>> use coordinate? (i.e use the same consumer group).
>> Thanks,
>> Harsha
>>
>>
>> On Thu, Nov 6, 2014, at 04:27 AM, Shamsul Haque wrote:
>>> Hi Nilesh,
>>>
>>> For point 1, try by increasing the 'topology.message.timeout.secs'
>>> to 10 to 15 mins or more then slowly decrease it which suits your
>>> topology. For me that worked for the same case.
>>> For point 2, we have used database to made track what we have
>>> processed, so don't process the same tuple again.
>>>
>>> regards
>>> Shams
>>>
>>> On Thursday 06 November 2014 12:16 PM, Nilesh Chhapru wrote:
>>>> Hi All,
>>>>
>>>> We are using storm Kafka integration where a Spout reads from a
>>>> Kafka topic.
>>>>
>>>> Following is the version of storm, Kafka and zookeeper we are
>>>> using.
>>>> *Strom : apache-storm-0.9.2-incubating*
>>>> *Kafka : kafka_2.8.0-0.8.1.1*
>>>> *Zookeeper : zookeeper-3.4.6*
>>>>
>>>> I am facing following issues at spout.
>>>> 1)The messages gets failed even if the average time taken is less
>>>> than max.topology.timeout value, also we aren’t getting any
>>>> exceptions at any of the bolt.
>>>> 2)A topology is finally emitting to the Kafka producer i.e. some
>>>> other topic, but the messages are getting duplicated due to
>>>> replay issues.
>>>> 3)The consumer group is isn’t working properly for storm Kafka
>>>> integration.
>>>> a.When we give same group id to the Kafka consumer of different
>>>> topology but still both are reading same messages.
>>>> b.If we have 2 different consumer with different consumer group id
>>>> in different topology it works fine if both topologies are
>>>> deployed at the same time, but doesn’t if we deploy
one of them after some of the message are already loaded in the topic
and read by the first topology.
>>>>
>>>> Kindly help me with above points as it is hampering the overall
>>>> scope of the project and also time lines.
>>>>
>>>> Do call or email in-case you need any other information.
>>>>
>>>>
>>>> *Nilesh Chhapru,*
>>>> (: +91 9619030491
>>>>
>>>>
>>>>
>>>> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>>>>
>>>> ****Opinions expressed in this e-mail are those of the author and
>>>> do not necessarily represent those of Ugam. Ugam does
not accept any responsibility or liability for it. This e-mail message
may contain proprietary, confidential or legally privileged information
for the sole use of the person or entity to whom this message was
originally addressed. Any review, re-transmission, dissemination or
other use of or taking of any action in reliance upon this information
by persons or entities other than the intended recipient is prohibited.
If you have received this e-mail in error, please delete it and all
attachments from any servers, hard drives or any other media.
>>>>
>>>> Warning: Sufficient measures have been taken to scan any presence
>>>> of viruses however the recipient should check this
email and any attachments for the presence of viruses. Ugam accepts no
liability for any damage caused by any virus transmitted by this
email. ****
>>>
>>> --
>>>
>>> Email had 1 attachment:
>>> * india-com.jpg
>>>
35k (image/jpeg)
>>
>>
>>
>> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>>
>> ****Opinions expressed in this e-mail are those of the author and do
>> not necessarily represent those of Ugam. Ugam does not accept any
>> responsibility
or liability for it. This e-mail message may contain proprietary,
confidential or legally privileged information for the sole use of the
person or entity to whom this message was originally addressed. Any
review, re-transmission, dissemination or other use of or taking of any
action in reliance upon this information by persons or entities other
than the intended recipient is prohibited. If you have received this
e-mail in error, please delete it and all attachments from any servers,
hard drives or any other media.
>>
>> Warning: Sufficient measures have been taken to scan any presence of
>> viruses however the recipient should check this email and any
>> attachments
for the presence of viruses. Ugam accepts no liability for any damage
caused by any virus transmitted by this email. ****
>> Email had 1 attachment:
>> * image001.jpg
>>
35k (image/jpeg)
>
>
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
>
****Opinions expressed in this e-mail are those of the author and do not
necessarily represent those of Ugam. Ugam does not accept any
responsibility or liability for it. This e-mail message may contain
proprietary, confidential or legally privileged information for the sole
use of the person or entity to whom this message was originally
addressed. Any review, re-transmission, dissemination or other use of or
taking of any action in reliance upon this information by persons or
entities other than the intended recipient is prohibited. If you have
received this e-mail in error, please delete it and all attachments from
any servers, hard drives or any other media.
>
>
Warning: Sufficient measures have been taken to scan any presence of
viruses however the recipient should check this email and any
attachments for the presence of viruses. Ugam accepts no liability for
any damage caused by any virus transmitted by this email.
****
> Email had 1 attachment:
> * image001.jpg 35k (image/jpeg)
RE: Issues Storm - Kafka
Posted by Nilesh Chhapru <ni...@ugamsolutions.com>.
Hi Harsha,
I wanted to broadcast one message to two consumer that is spouts in two topology, for which I read about consumer group in kafka docs, but this isn’t supported by the simple consumer provided by storm kafka.
Hence had to move to a high level consumer API, but a bit doubtful as some of the blogs says that it do a batch offset commit, do you have more details on this, or are you using high level api in any of you applications.
Also is there a way to broadcast a message from kafka using simple consumer provided by storm kafka integration.
Regards,
Nilesh Chhapru.
From: Harsha [mailto:storm@harsha.io]
Sent: 06 November 2014 09:57 PM
To: user@storm.apache.org
Subject: Re: Issues Storm - Kafka
Nilesh,
I thought you are using https://github.com/apache/storm/tree/master/external/storm-kafka. Any reason for you to use the kafkaSpout with consumer group support?
It handles the replays based on ack or fail. The linked KafkaSpout uses simpleApi which allows it go back n forth in the kafka queue which is not part of high-level consumer api ( this is the api where consumer groups are supported).
If you have two topologies and doing different operations and you are using consumer group than you should use different consumer group. If you are using single consumer group , data from kafka queue will be distributed to two topologies. So each topology gets part of the data.
My suggestion would be to use above kafkaspout If the only reason you are using https://github.com/HolmesNL/kafka-spout is for consumer groups.
Here is a link to kafka higher level api
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
"Why use the High Level Consumer
Sometimes the logic to read messages from Kafka doesn't care about handling the message offsets, it just wants the data. So the High Level Consumer is provided to abstract most of the details of consuming events from Kafka."
With storm you want control over handling the message offsets. If a message failed in a downstream bolt you want roll back the offset and replay the tuple from kafka. With higher level api you won't be able todo that.
-Harsha
On Thu, Nov 6, 2014, at 07:26 AM, Nilesh Chhapru wrote:
Hi Harsha / Shamsul,
Thanks for your inputs.
I am using BasicBasicBolt so it call the ack method automatically hence now explicitly doing the same.
Moreover for consumer group I have now moved KafkaSpout to https://github.com/HolmesNL/kafka-spout for getting the consumer group id, let me know if you have used this anytime.
I don’t need 2 consumer to coordinate but we have 2 topologies listening to one kafka topic and doing different operations on the same live saving to database and passing it to validator.
Do email in-case you need any other information.
Regards,
Nilesh Chhapru.
From: Harsha [mailto:storm@harsha.io]
Sent: 06 November 2014 08:36 PM
To: user@storm.apache.org<ma...@storm.apache.org>
Subject: Re: Issues Storm - Kafka
NIlesh and Shamsul,
2) you don't need to use another database to keep track processed tuples. Are you sure you are doing tuple ack and fail in the downstream bolts so that kafkaspout knows it processed the tuple. Tuple replays occurs if there are timeouts happening or incase of exceptions where you call fail on a tuple.
3)The consumer group is isn’t working properly for storm Kafka integration.
a.When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b.If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
a. Kafka Spout uses simple consumer api it doesn't need a consumer group. can you give us more details why you need two topologies to use coordinate? (i.e use the same consumer group).
Thanks,
Harsha
On Thu, Nov 6, 2014, at 04:27 AM, Shamsul Haque wrote:
Hi Nilesh,
For point 1, try by increasing the 'topology.message.timeout.secs' to 10 to 15 mins or more then slowly decrease it which suits your topology. For me that worked for the same case.
For point 2, we have used database to made track what we have processed, so don't process the same tuple again.
regards
Shams
On Thursday 06 November 2014 12:16 PM, Nilesh Chhapru wrote:
Hi All,
We are using storm Kafka integration where a Spout reads from a Kafka topic.
Following is the version of storm, Kafka and zookeeper we are using.
Strom : apache-storm-0.9.2-incubating
Kafka : kafka_2.8.0-0.8.1.1
Zookeeper : zookeeper-3.4.6
I am facing following issues at spout.
1)The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren’t getting any exceptions at any of the bolt.
2)A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
3)The consumer group is isn’t working properly for storm Kafka integration.
a.When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b.If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
Kindly help me with above points as it is hampering the overall scope of the project and also time lines.
Do call or email in-case you need any other information.
Nilesh Chhapru,
•: +91 9619030491
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
-- [cid:image001.jpg@01CFFA1B.453B02F0]
Email had 1 attachment:
* india-com.jpg
35k (image/jpeg)
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Email had 1 attachment:
* image001.jpg
35k (image/jpeg)
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Re: Issues Storm - Kafka
Posted by Harsha <st...@harsha.io>.
Nilesh, I thought you are using
https://github.com/apache/storm/tree/master/external/storm-kafka. Any
reason for you to use the kafkaSpout with consumer group support? It
handles the replays based on ack or fail. The linked KafkaSpout uses
simpleApi which allows it go back n forth in the kafka queue which is
not part of high-level consumer api ( this is the api where consumer
groups are supported). If you have two topologies and doing different
operations and you are using consumer group than you should use
different consumer group. If you are using single consumer group , data
from kafka queue will be distributed to two topologies. So each topology
gets part of the data. My suggestion would be to use above kafkaspout If
the only reason you are using https://github.com/HolmesNL/kafka-spout is
for consumer groups.
Here is a link to kafka higher level api
https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
"Why use the High Level Consumer
Sometimes the logic to read messages from Kafka doesn't care about
handling the message offsets, it just wants the data. So the High Level
Consumer is provided to abstract most of the details of consuming events
from Kafka."
With storm you want control over handling the message offsets. If a
message failed in a downstream bolt you want roll back the offset and
replay the tuple from kafka. With higher level api you won't be able
todo that.
-Harsha
On Thu, Nov 6, 2014, at 07:26 AM, Nilesh Chhapru wrote:
> Hi Harsha / Shamsul,
>
> Thanks for your inputs.
> I am using BasicBasicBolt so it call the ack method automatically
> hence now explicitly doing the same.
>
> Moreover for consumer group I have now moved KafkaSpout to
> https://github.com/HolmesNL/kafka-spout for getting the consumer group
> id, let me know if you have used this anytime.
>
> I don’t need 2 consumer to coordinate but we have 2 topologies
> listening to one kafka topic and doing different operations on the
> same live saving to database
and passing it to validator.
>
> Do email in-case you need any other information.
>
> *Regards*,
> *Nilesh Chhapru.*
>
> *From:* Harsha [mailto:storm@harsha.io]
>
> *Sent:* 06 November 2014 08:36 PM *To:* user@storm.apache.org
> *Subject:* Re: Issues Storm - Kafka
>
> NIlesh and Shamsul,
> 2) you don't need to use another database to keep track processed
> tuples. Are you sure you are doing tuple ack and fail in the
> downstream bolts so that kafkaspout knows it processed the tuple.
> Tuple replays occurs if there are
timeouts happening or incase of exceptions where you call fail
on a tuple.
>
>>> 3)The consumer group is isn’t working properly for storm Kafka
>>> integration.
>>> a.When we give same group id to the Kafka consumer of different
>>> topology but still both are reading same messages.
>>> b.If we have 2 different consumer with different consumer group id
>>> in different topology it works fine if both topologies are
>>> deployed at the same time, but doesn’t if we deploy one of them
>>> after some of the message are already loaded in the topic and read
by the first topology.
> a. Kafka Spout uses simple consumer api it doesn't need a consumer
> group. can you give us more details why you need two topologies to use
> coordinate? (i.e use the same consumer group).
> Thanks,
> Harsha
>
>
> On Thu, Nov 6, 2014, at 04:27 AM, Shamsul Haque wrote:
>> Hi Nilesh,
>>
>> For point 1, try by increasing the 'topology.message.timeout.secs' to
>> 10 to 15 mins or more then slowly decrease it which suits your
>> topology. For me that worked for the same case.
>> For point 2, we have used database to made track what we have
>> processed, so don't process the same tuple again.
>>
>> regards
>> Shams
>>
>> On Thursday 06 November 2014 12:16 PM, Nilesh Chhapru wrote:
>>> Hi All,
>>>
>>> We are using storm Kafka integration where a Spout reads from a
>>> Kafka topic.
>>>
>>> Following is the version of storm, Kafka and zookeeper we are using.
>>> *Strom : apache-storm-0.9.2-incubating*
>>> *Kafka : kafka_2.8.0-0.8.1.1*
>>> *Zookeeper : zookeeper-3.4.6*
>>>
>>> I am facing following issues at spout.
>>> 1)The messages gets failed even if the average time taken is less
>>> than max.topology.timeout value, also we aren’t getting any
>>> exceptions at any of the bolt.
>>> 2)A topology is finally emitting to the Kafka producer i.e. some
>>> other topic, but the messages are getting duplicated due to replay
>>> issues.
>>> 3)The consumer group is isn’t working properly for storm Kafka
>>> integration.
>>> a.When we give same group id to the Kafka consumer of different
>>> topology but still both are reading same messages.
>>> b.If we have 2 different consumer with different consumer group id
>>> in different topology it works fine if both topologies are
>>> deployed at the same time, but doesn’t if we deploy one of them
>>> after
some of the message are already loaded in the topic and read by the
first topology.
>>>
>>> Kindly help me with above points as it is hampering the overall
>>> scope of the project and also time lines.
>>>
>>> Do call or email in-case you need any other information.
>>>
>>>
>>> *Nilesh Chhapru,*
>>> (: +91 9619030491
>>>
>>>
>>> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>>>
>>> ****Opinions expressed in this e-mail are those of the author and do
>>> not necessarily represent those of Ugam. Ugam does not accept any
>>> responsibility
or liability for it. This e-mail message may contain proprietary,
confidential or legally privileged information for the sole use of the
person or entity to whom this message was originally addressed. Any
review, re-transmission, dissemination or other use of or taking of any
action in reliance upon this information by persons or entities other
than the intended recipient is prohibited. If you have received this
e-mail in error, please delete it and all attachments from any servers,
hard drives or any other media.
>>>
>>> Warning: Sufficient measures have been taken to scan any presence of
>>> viruses however the recipient should check this email and any
>>> attachments
for the presence of viruses. Ugam accepts no liability for any damage
caused by any virus transmitted by this email. ****
>>
>> --
>>
>> Email had 1 attachment:
>> * india-com.jpg
>>
35k (image/jpeg)
>
>
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
>
****Opinions expressed in this e-mail are those of the author and do not
necessarily represent those of Ugam. Ugam does not accept any
responsibility or liability for it. This e-mail message may contain
proprietary, confidential or legally privileged information for the sole
use of the person or entity to whom this message was originally
addressed. Any review, re-transmission, dissemination or other use of or
taking of any action in reliance upon this information by persons or
entities other than the intended recipient is prohibited. If you have
received this e-mail in error, please delete it and all attachments from
any servers, hard drives or any other media.
>
>
Warning: Sufficient measures have been taken to scan any presence of
viruses however the recipient should check this email and any
attachments for the presence of viruses. Ugam accepts no liability for
any damage caused by any virus transmitted by this email.
****
> Email had 1 attachment:
> * image001.jpg 35k (image/jpeg)
RE: Issues Storm - Kafka
Posted by Nilesh Chhapru <ni...@ugamsolutions.com>.
Hi Harsha / Shamsul,
Thanks for your inputs.
I am using BasicBasicBolt so it call the ack method automatically hence now explicitly doing the same.
Moreover for consumer group I have now moved KafkaSpout to https://github.com/HolmesNL/kafka-spout for getting the consumer group id, let me know if you have used this anytime.
I don’t need 2 consumer to coordinate but we have 2 topologies listening to one kafka topic and doing different operations on the same live saving to database and passing it to validator.
Do email in-case you need any other information.
Regards,
Nilesh Chhapru.
From: Harsha [mailto:storm@harsha.io]
Sent: 06 November 2014 08:36 PM
To: user@storm.apache.org
Subject: Re: Issues Storm - Kafka
NIlesh and Shamsul,
2) you don't need to use another database to keep track processed tuples. Are you sure you are doing tuple ack and fail in the downstream bolts so that kafkaspout knows it processed the tuple. Tuple replays occurs if there are timeouts happening or incase of exceptions where you call fail on a tuple.
3)The consumer group is isn’t working properly for storm Kafka integration.
a.When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b.If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
a. Kafka Spout uses simple consumer api it doesn't need a consumer group. can you give us more details why you need two topologies to use coordinate? (i.e use the same consumer group).
Thanks,
Harsha
On Thu, Nov 6, 2014, at 04:27 AM, Shamsul Haque wrote:
Hi Nilesh,
For point 1, try by increasing the 'topology.message.timeout.secs' to 10 to 15 mins or more then slowly decrease it which suits your topology. For me that worked for the same case.
For point 2, we have used database to made track what we have processed, so don't process the same tuple again.
regards
Shams
On Thursday 06 November 2014 12:16 PM, Nilesh Chhapru wrote:
Hi All,
We are using storm Kafka integration where a Spout reads from a Kafka topic.
Following is the version of storm, Kafka and zookeeper we are using.
Strom : apache-storm-0.9.2-incubating
Kafka : kafka_2.8.0-0.8.1.1
Zookeeper : zookeeper-3.4.6
I am facing following issues at spout.
1)The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren’t getting any exceptions at any of the bolt.
2)A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
3)The consumer group is isn’t working properly for storm Kafka integration.
a.When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b.If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn’t if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
Kindly help me with above points as it is hampering the overall scope of the project and also time lines.
Do call or email in-case you need any other information.
Nilesh Chhapru,
•: +91 9619030491
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
-- [cid:image001.jpg@01CFFA03.5DDA0990]
Email had 1 attachment:
* india-com.jpg
35k (image/jpeg)
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****
Re: Issues Storm - Kafka
Posted by Harsha <st...@harsha.io>.
NIlesh and Shamsul, 2) you don't need to use another database to keep
track processed tuples. Are you sure you are doing tuple ack and fail in
the downstream bolts so that kafkaspout knows it processed the tuple.
Tuple replays occurs if there are timeouts happening or incase of
exceptions where you call fail on a tuple.
>> 3)The consumer group is isn’t working properly for storm Kafka
>> integration.
>> a.When we give same group id to the Kafka consumer of different
>> topology but still both are reading same messages.
>> b.If we have 2 different consumer with different consumer group id in
>> different topology it works fine if both topologies are deployed at
>> the same time, but doesn’t if we deploy one of them after some of
>> the message are already loaded in the topic and read by the first
>> topology.
a. Kafka Spout uses simple consumer api it doesn't need a consumer
group. can you give us more details why you need two topologies to use
coordinate? (i.e use the same consumer group). Thanks, Harsha
On Thu, Nov 6, 2014, at 04:27 AM, Shamsul Haque wrote:
> Hi Nilesh,
>
>
For point 1, try by increasing the 'topology.message.timeout.secs'
to 10 to 15 mins or more then slowly decrease it which suits your
topology. For me that worked for the same case.
>
For point 2, we have used database to made track what we have
processed, so don't process the same tuple again.
>
>
regards
>
Shams
>
> On Thursday 06 November 2014 12:16 PM,
Nilesh Chhapru wrote:
>> Hi All,
>>
>> We are using storm Kafka integration where
a Spout reads from a Kafka topic.
>>
>> Following is the version of storm, Kafka
and zookeeper we are using.
>> *Strom : apache-storm-0.9.2-incubating*
>> *Kafka : kafka_2.8.0-0.8.1.1*
>> *Zookeeper : zookeeper-3.4.6*
>>
>> I am facing following issues at spout.
>> 1)The messages gets failed even if
the average time taken is less than max.topology.timeout
value, also we aren’t getting any exceptions at any of the
bolt.
>> 2)A topology is finally emitting
to the Kafka producer i.e. some other topic, but the messages
are getting duplicated due to replay issues.
>> 3)The consumer group is isn’t
working properly for storm Kafka integration.
>> a.When we give same group id to
the Kafka consumer of different topology but still both are
reading same messages.
>> b.If we have 2 different consumer
with different consumer group id in different topology it
works fine if both topologies are deployed at the same time,
but doesn’t if we deploy one of them after some of the
message are already loaded in the topic and read by the
first topology.
>>
>> Kindly help me with above points as it is
hampering the overall scope of the project and also time
lines.
>>
>> Do call or email in-case you need any other
information.
>>
>>
>> *Nilesh
Chhapru,*
>> (:
+91 9619030491
>>
>> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>>
>>
****Opinions expressed in this e-mail are those of the author
and do not necessarily represent those of Ugam. Ugam does not
accept any responsibility or liability for it. This e-mail
message may contain proprietary, confidential or legally
privileged information for the sole use of the person or entity
to whom this message was originally addressed. Any review,
re-transmission, dissemination or other use of or taking of any
action in reliance upon this information by persons or entities
other than the intended recipient is prohibited. If you have
received this e-mail in error, please delete it and all
attachments from any servers, hard drives or any other media.
>>
>>
Warning: Sufficient measures have been taken to scan any
presence of viruses however the recipient should check this
email and any attachments for the presence of viruses. Ugam
accepts no liability for any damage caused by any virus
transmitted by this email. ****
>
> --
>
> Email had 1 attachment:
> * india-com.jpg 35k (image/jpeg)
Re: Issues Storm - Kafka
Posted by Shamsul Haque <sh...@corp.india.com>.
Hi Nilesh,
For point 1, try by increasing the 'topology.message.timeout.secs' to 10
to 15 mins or more then slowly decrease it which suits your topology.
For me that worked for the same case.
For point 2, we have used database to made track what we have processed,
so don't process the same tuple again.
regards
Shams
On Thursday 06 November 2014 12:16 PM, Nilesh Chhapru wrote:
>
> Hi All,
>
> We are using storm Kafka integration where a Spout reads from a Kafka
> topic.
>
> Following is the version of storm, Kafka and zookeeper we are using.
>
> *Strom : apache-storm-0.9.2-incubating*
>
> *Kafka : kafka_2.8.0-0.8.1.1*
>
> *Zookeeper : zookeeper-3.4.6*
>
> I am facing following issues at spout.
>
> 1)The messages gets failed even if the average time taken is less than
> max.topology.timeout value, also we aren’t getting any exceptions at
> any of the bolt.
>
> 2)A topology is finally emitting to the Kafka producer i.e. some other
> topic, but the messages are getting duplicated due to replay issues.
>
> 3)The consumer group is isn’t working properly for storm Kafka
> integration.
>
> a.When we give same group id to the Kafka consumer of different
> topology but still both are reading same messages.
>
> b.If we have 2 different consumer with different consumer group id in
> different topology it works fine if both topologies are deployed at
> the same time, but doesn’t if we deploy one of them after some of the
> message are already loaded in the topic and read by the first topology.
>
> Kindly help me with above points as it is hampering the overall scope
> of the project and also time lines.
>
> Do call or email in-case you need any other information.
>
> *Nilesh Chhapru,*
>
> (: +91 9619030491
>
>
> ------------------------------------------------------------------------
> ---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
>
> ****Opinions expressed in this e-mail are those of the author and do
> not necessarily represent those of Ugam. Ugam does not accept any
> responsibility or liability for it. This e-mail message may contain
> proprietary, confidential or legally privileged information for the
> sole use of the person or entity to whom this message was originally
> addressed. Any review, re-transmission, dissemination or other use of
> or taking of any action in reliance upon this information by persons
> or entities other than the intended recipient is prohibited. If you
> have received this e-mail in error, please delete it and all
> attachments from any servers, hard drives or any other media.
>
> Warning: Sufficient measures have been taken to scan any presence of
> viruses however the recipient should check this email and any
> attachments for the presence of viruses. Ugam accepts no liability for
> any damage caused by any virus transmitted by this email. ****
--
Issues Storm - Kafka
Posted by Nilesh Chhapru <ni...@ugamsolutions.com>.
Hi All,
We are using storm Kafka integration where a Spout reads from a Kafka topic.
Following is the version of storm, Kafka and zookeeper we are using.
Strom : apache-storm-0.9.2-incubating
Kafka : kafka_2.8.0-0.8.1.1
Zookeeper : zookeeper-3.4.6
I am facing following issues at spout.
1) The messages gets failed even if the average time taken is less than max.topology.timeout value, also we aren't getting any exceptions at any of the bolt.
2) A topology is finally emitting to the Kafka producer i.e. some other topic, but the messages are getting duplicated due to replay issues.
3) The consumer group is isn't working properly for storm Kafka integration.
a. When we give same group id to the Kafka consumer of different topology but still both are reading same messages.
b. If we have 2 different consumer with different consumer group id in different topology it works fine if both topologies are deployed at the same time, but doesn't if we deploy one of them after some of the message are already loaded in the topic and read by the first topology.
Kindly help me with above points as it is hampering the overall scope of the project and also time lines.
Do call or email in-case you need any other information.
Nilesh Chhapru,
*: +91 9619030491
________________________________
---------------------------------------------------------------------------------------Disclaimer----------------------------------------------------------------------------------------------
****Opinions expressed in this e-mail are those of the author and do not necessarily represent those of Ugam. Ugam does not accept any responsibility or liability for it. This e-mail message may contain proprietary, confidential or legally privileged information for the sole use of the person or entity to whom this message was originally addressed. Any review, re-transmission, dissemination or other use of or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, please delete it and all attachments from any servers, hard drives or any other media.
Warning: Sufficient measures have been taken to scan any presence of viruses however the recipient should check this email and any attachments for the presence of viruses. Ugam accepts no liability for any damage caused by any virus transmitted by this email. ****