You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by gaganbm <ga...@gmail.com> on 2014/04/10 08:24:12 UTC

Strange behaviour of different SSCs with same Kafka topic

I am really at my wits' end here.

I have different Streaming contexts, lets say 2, and both listening to same
Kafka topics. I establish the KafkaStream by setting different consumer
groups to each of them.

Ideally, I should be seeing the kafka events in both the streams. But what I
am getting is really unpredictable. Only one stream gets a lot of events and
the other one almost gets nothing or very less compared to the other. Also
the frequency is very skewed. I get a lot of events in one stream
continuously, and after some duration I get a few events in the other one.

I don't know where I am going wrong. I can see consumer fetcher threads for
both the streams that listen to the Kafka topics.  

I can give further details if needed. Any help will be great. 

Thanks



--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Strange behaviour of different SSCs with same Kafka topic

Posted by Tathagata Das <ta...@gmail.com>.
As I said before, starting two SSCs in the JVM is not supported, neither in
local mode or nor in cluster mode. You have two choices.
1. run one ssc in one JVM: This will use a single Spark cluster (as it will
use a single SparkContext) for the computation. Therefore they can share
the cluster's resources. To do two different computations, you can one of
the following
(i) If you have to receiver two different streams of data and process them
differently, then create two input DStreams and then do transformation
accordingly.
(ii) If you just have to do two different transformation on the same stream
of data, then you can create one input DStream and do two sets of
transformation on them.
val inputStream = ...
val transformedStream1 = inputstream.map(....)
val transformedStream2 = inputstream.filter(....)

2. If you want the two streaming computations to run on the two different
Spark clusters, then you have run two different JVM processes, each having
one streaming context each.

TD


On Mon, Apr 21, 2014 at 9:17 PM, gaganbm <ga...@gmail.com> wrote:

> Yes. I am running this in a local mode and the SSCs run on the same JVM.
> So, if I deploy this on a cluster, such behavior would be gone ? Also, is
> there anyway I can start the SSCs on a local machine but on different JVMs?
> I couldn't find anything about this in the documentation.
>
> The inter-mingling of data seems to be gone after I made some of those
> external classes as 'scala objects' and keeping static maps and all. Is
> that a good idea as far as performance is concerned ?
>
> Thanks
>
> Gagan B Mishra
>
>
> On Tue, Apr 22, 2014 at 1:59 AM, Tathagata Das [via Apache Spark User
> List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=4582&i=0>>wrote:
>
>> Are you by any chance starting two StreamingContexts in the same JVM?
>> That could explain a lot of the weird mixing of data that you are seeing.
>> Its not a supported usage scenario to start multiple streamingContexts
>> simultaneously in the same JVM.
>>
>> TD
>>
>>
>> On Thu, Apr 17, 2014 at 10:58 PM, gaganbm <[hidden email]<http://user/SendEmail.jtp?type=node&node=4556&i=0>
>> > wrote:
>>
>>> It happens with normal data rate, i.e., lets say 20 records per second.
>>>
>>> Apart from that, I am also getting some more strange behavior. Let me
>>> explain.
>>>
>>> I establish two sscs. Start them one after another. In SSCs I get the
>>> streams from Kafka sources, and do some manipulations. Like, adding some
>>> "Record_Name" for example, to each of the incoming records. Now this
>>> Record_Name is different for both the SSCs, and I get this field from some
>>> other class, not relevant to the streams.
>>>
>>> Now, expected behavior should be, all records in SSC1 gets added with
>>> the field RECORD_NAME_1 and all records in SSC2 should get added with the
>>> field RECORD_NAME_2. Both the SSCs have nothing to do with each other as I
>>> believe.
>>>
>>> However, strangely enough, I find many records in SSC1 get added with
>>> RECORD_NAME_2 and vice versa. Is it some kind of serialization issue ?
>>> That, the class which provides this RECORD_NAME gets serialized and is
>>> reconstructed and then some weird thing happens inside ? I am unable to
>>> figure out.
>>>
>>> So, apart from skewed frequency and volume of records in both the
>>> streams, I am getting this inter-mingling of data among the streams.
>>>
>>> Can you help me in how to use some external data to manipulate the RDD
>>> records ?
>>>
>>> Thanks and regards
>>>
>>> Gagan B Mishra
>>>
>>>
>>> *Programmer*
>>> *560034, Bangalore*
>>> *India*
>>>
>>>
>>> On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User
>>> List] <[hidden email]<http://user/SendEmail.jtp?type=node&node=4434&i=0>
>>> > wrote:
>>>
>>>> Does this happen at low event rate for that topic as well, or only for
>>>> a high volume rate?
>>>>
>>>> TD
>>>>
>>>>
>>>> On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <[hidden email]<http://user/SendEmail.jtp?type=node&node=4238&i=0>
>>>> > wrote:
>>>>
>>>>> I am really at my wits' end here.
>>>>>
>>>>> I have different Streaming contexts, lets say 2, and both listening to
>>>>> same
>>>>> Kafka topics. I establish the KafkaStream by setting different consumer
>>>>> groups to each of them.
>>>>>
>>>>> Ideally, I should be seeing the kafka events in both the streams. But
>>>>> what I
>>>>> am getting is really unpredictable. Only one stream gets a lot of
>>>>> events and
>>>>> the other one almost gets nothing or very less compared to the other.
>>>>> Also
>>>>> the frequency is very skewed. I get a lot of events in one stream
>>>>> continuously, and after some duration I get a few events in the other
>>>>> one.
>>>>>
>>>>> I don't know where I am going wrong. I can see consumer fetcher
>>>>> threads for
>>>>> both the streams that listen to the Kafka topics.
>>>>>
>>>>> I can give further details if needed. Any help will be great.
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------
>>>>  If you reply to this email, your message will be added to the
>>>> discussion below:
>>>>
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4238.html
>>>>  To start a new topic under Apache Spark User List, email [hidden
>>>> email] <http://user/SendEmail.jtp?type=node&node=4434&i=1>
>>>> To unsubscribe from Apache Spark User List, click here.
>>>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>>
>>>
>>>
>>> ------------------------------
>>> View this message in context: Re: Strange behaviour of different SSCs
>>> with same Kafka topic<http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4434.html>
>>>
>>> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4556.html
>>  To start a new topic under Apache Spark User List, email [hidden email]<http://user/SendEmail.jtp?type=node&node=4582&i=1>
>> To unsubscribe from Apache Spark User List, click here.
>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: Strange behaviour of different SSCs
> with same Kafka topic<http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4582.html>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Re: Strange behaviour of different SSCs with same Kafka topic

Posted by gaganbm <ga...@gmail.com>.
Yes. I am running this in a local mode and the SSCs run on the same JVM.
So, if I deploy this on a cluster, such behavior would be gone ? Also, is
there anyway I can start the SSCs on a local machine but on different JVMs?
I couldn't find anything about this in the documentation.

The inter-mingling of data seems to be gone after I made some of those
external classes as 'scala objects' and keeping static maps and all. Is
that a good idea as far as performance is concerned ?

Thanks

Gagan B Mishra


On Tue, Apr 22, 2014 at 1:59 AM, Tathagata Das [via Apache Spark User List]
<ml...@n3.nabble.com> wrote:

> Are you by any chance starting two StreamingContexts in the same JVM? That
> could explain a lot of the weird mixing of data that you are seeing. Its
> not a supported usage scenario to start multiple streamingContexts
> simultaneously in the same JVM.
>
> TD
>
>
> On Thu, Apr 17, 2014 at 10:58 PM, gaganbm <[hidden email]<http://user/SendEmail.jtp?type=node&node=4556&i=0>
> > wrote:
>
>> It happens with normal data rate, i.e., lets say 20 records per second.
>>
>> Apart from that, I am also getting some more strange behavior. Let me
>> explain.
>>
>> I establish two sscs. Start them one after another. In SSCs I get the
>> streams from Kafka sources, and do some manipulations. Like, adding some
>> "Record_Name" for example, to each of the incoming records. Now this
>> Record_Name is different for both the SSCs, and I get this field from some
>> other class, not relevant to the streams.
>>
>> Now, expected behavior should be, all records in SSC1 gets added with the
>> field RECORD_NAME_1 and all records in SSC2 should get added with the field
>> RECORD_NAME_2. Both the SSCs have nothing to do with each other as I
>> believe.
>>
>> However, strangely enough, I find many records in SSC1 get added with
>> RECORD_NAME_2 and vice versa. Is it some kind of serialization issue ?
>> That, the class which provides this RECORD_NAME gets serialized and is
>> reconstructed and then some weird thing happens inside ? I am unable to
>> figure out.
>>
>> So, apart from skewed frequency and volume of records in both the
>> streams, I am getting this inter-mingling of data among the streams.
>>
>> Can you help me in how to use some external data to manipulate the RDD
>> records ?
>>
>> Thanks and regards
>>
>> Gagan B Mishra
>>
>>
>> *Programmer*
>> *560034, Bangalore*
>> *India*
>>
>>
>> On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User
>> List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=4434&i=0>
>> > wrote:
>>
>>> Does this happen at low event rate for that topic as well, or only for a
>>> high volume rate?
>>>
>>> TD
>>>
>>>
>>> On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <[hidden email]<http://user/SendEmail.jtp?type=node&node=4238&i=0>
>>> > wrote:
>>>
>>>> I am really at my wits' end here.
>>>>
>>>> I have different Streaming contexts, lets say 2, and both listening to
>>>> same
>>>> Kafka topics. I establish the KafkaStream by setting different consumer
>>>> groups to each of them.
>>>>
>>>> Ideally, I should be seeing the kafka events in both the streams. But
>>>> what I
>>>> am getting is really unpredictable. Only one stream gets a lot of
>>>> events and
>>>> the other one almost gets nothing or very less compared to the other.
>>>> Also
>>>> the frequency is very skewed. I get a lot of events in one stream
>>>> continuously, and after some duration I get a few events in the other
>>>> one.
>>>>
>>>> I don't know where I am going wrong. I can see consumer fetcher threads
>>>> for
>>>> both the streams that listen to the Kafka topics.
>>>>
>>>> I can give further details if needed. Any help will be great.
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>> ------------------------------
>>>  If you reply to this email, your message will be added to the
>>> discussion below:
>>>
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4238.html
>>>  To start a new topic under Apache Spark User List, email [hidden email]<http://user/SendEmail.jtp?type=node&node=4434&i=1>
>>> To unsubscribe from Apache Spark User List, click here.
>>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>>
>>
>>
>> ------------------------------
>> View this message in context: Re: Strange behaviour of different SSCs
>> with same Kafka topic<http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4434.html>
>>
>> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4556.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h20@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Z2FnYW4ubWlzaHJhQGdtYWlsLmNvbXwxfC0yOTI0Mjc1NjE=>
> .
> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4582.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Strange behaviour of different SSCs with same Kafka topic

Posted by Tathagata Das <ta...@gmail.com>.
Are you by any chance starting two StreamingContexts in the same JVM? That
could explain a lot of the weird mixing of data that you are seeing. Its
not a supported usage scenario to start multiple streamingContexts
simultaneously in the same JVM.

TD


On Thu, Apr 17, 2014 at 10:58 PM, gaganbm <ga...@gmail.com> wrote:

> It happens with normal data rate, i.e., lets say 20 records per second.
>
> Apart from that, I am also getting some more strange behavior. Let me
> explain.
>
> I establish two sscs. Start them one after another. In SSCs I get the
> streams from Kafka sources, and do some manipulations. Like, adding some
> "Record_Name" for example, to each of the incoming records. Now this
> Record_Name is different for both the SSCs, and I get this field from some
> other class, not relevant to the streams.
>
> Now, expected behavior should be, all records in SSC1 gets added with the
> field RECORD_NAME_1 and all records in SSC2 should get added with the field
> RECORD_NAME_2. Both the SSCs have nothing to do with each other as I
> believe.
>
> However, strangely enough, I find many records in SSC1 get added with
> RECORD_NAME_2 and vice versa. Is it some kind of serialization issue ?
> That, the class which provides this RECORD_NAME gets serialized and is
> reconstructed and then some weird thing happens inside ? I am unable to
> figure out.
>
> So, apart from skewed frequency and volume of records in both the streams,
> I am getting this inter-mingling of data among the streams.
>
> Can you help me in how to use some external data to manipulate the RDD
> records ?
>
> Thanks and regards
>
> Gagan B Mishra
>
>
> *Programmer*
> *560034, Bangalore*
> *India*
>
>
> On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User
> List] <[hidden email] <http://user/SendEmail.jtp?type=node&node=4434&i=0>>wrote:
>
>> Does this happen at low event rate for that topic as well, or only for a
>> high volume rate?
>>
>> TD
>>
>>
>> On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <[hidden email]<http://user/SendEmail.jtp?type=node&node=4238&i=0>
>> > wrote:
>>
>>> I am really at my wits' end here.
>>>
>>> I have different Streaming contexts, lets say 2, and both listening to
>>> same
>>> Kafka topics. I establish the KafkaStream by setting different consumer
>>> groups to each of them.
>>>
>>> Ideally, I should be seeing the kafka events in both the streams. But
>>> what I
>>> am getting is really unpredictable. Only one stream gets a lot of events
>>> and
>>> the other one almost gets nothing or very less compared to the other.
>>> Also
>>> the frequency is very skewed. I get a lot of events in one stream
>>> continuously, and after some duration I get a few events in the other
>>> one.
>>>
>>> I don't know where I am going wrong. I can see consumer fetcher threads
>>> for
>>> both the streams that listen to the Kafka topics.
>>>
>>> I can give further details if needed. Any help will be great.
>>>
>>> Thanks
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> ------------------------------
>>  If you reply to this email, your message will be added to the
>> discussion below:
>>
>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4238.html
>>  To start a new topic under Apache Spark User List, email [hidden email]<http://user/SendEmail.jtp?type=node&node=4434&i=1>
>> To unsubscribe from Apache Spark User List, click here.
>> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
> ------------------------------
> View this message in context: Re: Strange behaviour of different SSCs
> with same Kafka topic<http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4434.html>
>
> Sent from the Apache Spark User List mailing list archive<http://apache-spark-user-list.1001560.n3.nabble.com/>at Nabble.com.
>

Re: Strange behaviour of different SSCs with same Kafka topic

Posted by gaganbm <ga...@gmail.com>.
It happens with normal data rate, i.e., lets say 20 records per second.

Apart from that, I am also getting some more strange behavior. Let me
explain.

I establish two sscs. Start them one after another. In SSCs I get the
streams from Kafka sources, and do some manipulations. Like, adding some
"Record_Name" for example, to each of the incoming records. Now this
Record_Name is different for both the SSCs, and I get this field from some
other class, not relevant to the streams.

Now, expected behavior should be, all records in SSC1 gets added with the
field RECORD_NAME_1 and all records in SSC2 should get added with the field
RECORD_NAME_2. Both the SSCs have nothing to do with each other as I
believe.

However, strangely enough, I find many records in SSC1 get added with
RECORD_NAME_2 and vice versa. Is it some kind of serialization issue ?
That, the class which provides this RECORD_NAME gets serialized and is
reconstructed and then some weird thing happens inside ? I am unable to
figure out.

So, apart from skewed frequency and volume of records in both the streams,
I am getting this inter-mingling of data among the streams.

Can you help me in how to use some external data to manipulate the RDD
records ?

Thanks and regards

Gagan B Mishra


*Programmer*
*560034, Bangalore*
*India*


On Tue, Apr 15, 2014 at 4:09 AM, Tathagata Das [via Apache Spark User List]
<ml...@n3.nabble.com> wrote:

> Does this happen at low event rate for that topic as well, or only for a
> high volume rate?
>
> TD
>
>
> On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <[hidden email]<http://user/SendEmail.jtp?type=node&node=4238&i=0>
> > wrote:
>
>> I am really at my wits' end here.
>>
>> I have different Streaming contexts, lets say 2, and both listening to
>> same
>> Kafka topics. I establish the KafkaStream by setting different consumer
>> groups to each of them.
>>
>> Ideally, I should be seeing the kafka events in both the streams. But
>> what I
>> am getting is really unpredictable. Only one stream gets a lot of events
>> and
>> the other one almost gets nothing or very less compared to the other. Also
>> the frequency is very skewed. I get a lot of events in one stream
>> continuously, and after some duration I get a few events in the other one.
>>
>> I don't know where I am going wrong. I can see consumer fetcher threads
>> for
>> both the streams that listen to the Kafka topics.
>>
>> I can give further details if needed. Any help will be great.
>>
>> Thanks
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>
>
>
> ------------------------------
>  If you reply to this email, your message will be added to the discussion
> below:
>
> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4238.html
>  To start a new topic under Apache Spark User List, email
> ml-node+s1001560n1h20@n3.nabble.com
> To unsubscribe from Apache Spark User List, click here<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=Z2FnYW4ubWlzaHJhQGdtYWlsLmNvbXwxfC0yOTI0Mjc1NjE=>
> .
> NAML<http://apache-spark-user-list.1001560.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050p4434.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

Re: Strange behaviour of different SSCs with same Kafka topic

Posted by Tathagata Das <ta...@gmail.com>.
Does this happen at low event rate for that topic as well, or only for a
high volume rate?

TD


On Wed, Apr 9, 2014 at 11:24 PM, gaganbm <ga...@gmail.com> wrote:

> I am really at my wits' end here.
>
> I have different Streaming contexts, lets say 2, and both listening to same
> Kafka topics. I establish the KafkaStream by setting different consumer
> groups to each of them.
>
> Ideally, I should be seeing the kafka events in both the streams. But what
> I
> am getting is really unpredictable. Only one stream gets a lot of events
> and
> the other one almost gets nothing or very less compared to the other. Also
> the frequency is very skewed. I get a lot of events in one stream
> continuously, and after some duration I get a few events in the other one.
>
> I don't know where I am going wrong. I can see consumer fetcher threads for
> both the streams that listen to the Kafka topics.
>
> I can give further details if needed. Any help will be great.
>
> Thanks
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Strange-behaviour-of-different-SSCs-with-same-Kafka-topic-tp4050.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>