You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Mohit Anchlia <mo...@gmail.com> on 2015/03/21 21:22:35 UTC

Spark streaming alerting

Is there a module in spark streaming that lets you listen to
the alerts/conditions as they happen in the streaming module? Generally
spark streaming components will execute on large set of clusters like hdfs
or Cassandra, however when it comes to alerting you generally can't send it
directly from the spark workers, which means you need a way to listen to
the alerts.

Re: Spark streaming alerting

Posted by Jeffrey Jedele <je...@gmail.com>.
What exactly do you mean by "alerts"?

Something specific to your data or general events of the spark cluster? For
the first, sth like Akhil suggested should work. For the latter, I would
suggest having a log consolidation system like logstash in place and use
this to generate alerts.

Regards,
Jeff

2015-03-23 7:39 GMT+01:00 Akhil Das <ak...@sigmoidanalytics.com>:

> What do you mean you can't send it directly from spark workers? Here's a
> simple approach which you could do:
>
>     val data = ssc.textFileStream("sigmoid/")
>     val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd =>
> alert("Errors :" + rdd.count()))
>
> And the alert() function could be anything triggering an email or sending
> an SMS alert.
>
> Thanks
> Best Regards
>
> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>
>> Is there a module in spark streaming that lets you listen to
>> the alerts/conditions as they happen in the streaming module? Generally
>> spark streaming components will execute on large set of clusters like hdfs
>> or Cassandra, however when it comes to alerting you generally can't send it
>> directly from the spark workers, which means you need a way to listen to
>> the alerts.
>>
>
>

Re: Spark streaming alerting

Posted by Helena Edelson <he...@datastax.com>.
I created a jira ticket for my work in both the spark and spark-cassandra-connector JIRAs, I don’t know why you can not see them.
Users can stream from any cassandra table, just as one can stream from a Kafka topic; same principle. 

Helena
@helenaedelson

> On Mar 24, 2015, at 11:29 AM, Anwar Rizal <an...@gmail.com> wrote:
> 
> Helena,
> 
> The CassandraInputDStream sounds interesting. I dont find many things in the jira though. Do you have more details on what it tries to achieve ?
> 
> Thanks,
> Anwar.
> 
> On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson <helena.edelson@datastax.com <ma...@datastax.com>> wrote:
> Streaming _from_ cassandra, CassandraInputDStream, is coming BTW https://issues.apache.org/jira/browse/SPARK-6283 <https://issues.apache.org/jira/browse/SPARK-6283>
> I am working on it now.
> 
> Helena
> @helenaedelson
> 
>> On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail <khanderao.kand@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Akhil 
>> 
>> You are right in tour answer to what Mohit wrote. However what Mohit seems to be alluring but did not write properly might be different.
>> 
>> Mohit
>> 
>> You are wrong in saying "generally" streaming works in HDFS and cassandra . Streaming typically works with streaming or queing source like Kafka, kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 ) However , streaming context ( "receiver" wishing the streaming context ) gets events/messages/records and forms a time window based batch (RDD)- 
>> 
>> So there is a maximum gap of window time from alert message was available to spark and when the processing happens. I think you meant about this. 
>> 
>> As per spark programming model, RDD is the right way to deal with data.  If you are fine with the minimum delay of say a sec (based on min time window that dstreaming can support) then what Rohit gave is a right model. 
>> 
>> Khanderao
>> 
>> On Mar 22, 2015, at 11:39 PM, Akhil Das <akhil@sigmoidanalytics.com <ma...@sigmoidanalytics.com>> wrote:
>> 
>>> What do you mean you can't send it directly from spark workers? Here's a simple approach which you could do:
>>> 
>>>     val data = ssc.textFileStream("sigmoid/")
>>>     val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd => alert("Errors :" + rdd.count()))
>>> 
>>> And the alert() function could be anything triggering an email or sending an SMS alert.
>>> 
>>> Thanks
>>> Best Regards
>>> 
>>> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mohitanchlia@gmail.com <ma...@gmail.com>> wrote:
>>> Is there a module in spark streaming that lets you listen to the alerts/conditions as they happen in the streaming module? Generally spark streaming components will execute on large set of clusters like hdfs or Cassandra, however when it comes to alerting you generally can't send it directly from the spark workers, which means you need a way to listen to the alerts.
>>> 
> 
> 


Re: Spark streaming alerting

Posted by Anwar Rizal <an...@gmail.com>.
Helena,

The CassandraInputDStream sounds interesting. I dont find many things in
the jira though. Do you have more details on what it tries to achieve ?

Thanks,
Anwar.

On Tue, Mar 24, 2015 at 2:39 PM, Helena Edelson <helena.edelson@datastax.com
> wrote:

> Streaming _from_ cassandra, CassandraInputDStream, is coming BTW
> https://issues.apache.org/jira/browse/SPARK-6283
> I am working on it now.
>
> Helena
> @helenaedelson
>
> On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail <
> khanderao.kand@gmail.com> wrote:
>
> Akhil
>
> You are right in tour answer to what Mohit wrote. However what Mohit seems
> to be alluring but did not write properly might be different.
>
> Mohit
>
> You are wrong in saying "generally" streaming works in HDFS and cassandra
> . Streaming typically works with streaming or queing source like Kafka,
> kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 )
> However , streaming context ( "receiver" wishing the streaming context )
> gets events/messages/records and forms a time window based batch (RDD)-
>
> So there is a maximum gap of window time from alert message was available
> to spark and when the processing happens. I think you meant about this.
>
> As per spark programming model, RDD is the right way to deal with data.
> If you are fine with the minimum delay of say a sec (based on min time
> window that dstreaming can support) then what Rohit gave is a right model.
>
> Khanderao
>
> On Mar 22, 2015, at 11:39 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
> What do you mean you can't send it directly from spark workers? Here's a
> simple approach which you could do:
>
>     val data = ssc.textFileStream("sigmoid/")
>     val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd =>
> alert("Errors :" + rdd.count()))
>
> And the alert() function could be anything triggering an email or sending
> an SMS alert.
>
> Thanks
> Best Regards
>
> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>
>> Is there a module in spark streaming that lets you listen to
>> the alerts/conditions as they happen in the streaming module? Generally
>> spark streaming components will execute on large set of clusters like hdfs
>> or Cassandra, however when it comes to alerting you generally can't send it
>> directly from the spark workers, which means you need a way to listen to
>> the alerts.
>>
>
>
>

Re: Spark streaming alerting

Posted by Helena Edelson <he...@datastax.com>.
Streaming _from_ cassandra, CassandraInputDStream, is coming BTW https://issues.apache.org/jira/browse/SPARK-6283 <https://issues.apache.org/jira/browse/SPARK-6283>
I am working on it now.

Helena
@helenaedelson

> On Mar 23, 2015, at 5:22 AM, Khanderao Kand Gmail <kh...@gmail.com> wrote:
> 
> Akhil 
> 
> You are right in tour answer to what Mohit wrote. However what Mohit seems to be alluring but did not write properly might be different.
> 
> Mohit
> 
> You are wrong in saying "generally" streaming works in HDFS and cassandra . Streaming typically works with streaming or queing source like Kafka, kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 ) However , streaming context ( "receiver" wishing the streaming context ) gets events/messages/records and forms a time window based batch (RDD)- 
> 
> So there is a maximum gap of window time from alert message was available to spark and when the processing happens. I think you meant about this. 
> 
> As per spark programming model, RDD is the right way to deal with data.  If you are fine with the minimum delay of say a sec (based on min time window that dstreaming can support) then what Rohit gave is a right model. 
> 
> Khanderao
> 
> On Mar 22, 2015, at 11:39 PM, Akhil Das <akhil@sigmoidanalytics.com <ma...@sigmoidanalytics.com>> wrote:
> 
>> What do you mean you can't send it directly from spark workers? Here's a simple approach which you could do:
>> 
>>     val data = ssc.textFileStream("sigmoid/")
>>     val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd => alert("Errors :" + rdd.count()))
>> 
>> And the alert() function could be anything triggering an email or sending an SMS alert.
>> 
>> Thanks
>> Best Regards
>> 
>> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mohitanchlia@gmail.com <ma...@gmail.com>> wrote:
>> Is there a module in spark streaming that lets you listen to the alerts/conditions as they happen in the streaming module? Generally spark streaming components will execute on large set of clusters like hdfs or Cassandra, however when it comes to alerting you generally can't send it directly from the spark workers, which means you need a way to listen to the alerts.
>> 


Re: Spark streaming alerting

Posted by Tathagata Das <td...@databricks.com>.
Something like that is not really supported out of the box. You will have
to implement your RPC mechanism (sending stuff back to the driver for
forwarding) own for that.

TD

On Mon, Mar 23, 2015 at 9:43 AM, Mohit Anchlia <mo...@gmail.com>
wrote:

> I think I didn't explain myself properly :) What I meant to say was that
> generally spark worker runs on either on HDFS's data nodes or on Cassandra
> nodes, which typically is in a private network (protected). When a
> condition is matched it's difficult to send out the alerts directly from
> the worker nodes because of the security concerns. I was wondering if there
> is a way to listen on the events as they occur on the sliding window scale
> or is the best way to accomplish is to post back to a queue?
>
> On Mon, Mar 23, 2015 at 2:22 AM, Khanderao Kand Gmail <
> khanderao.kand@gmail.com> wrote:
>
>> Akhil
>>
>> You are right in tour answer to what Mohit wrote. However what Mohit
>> seems to be alluring but did not write properly might be different.
>>
>> Mohit
>>
>> You are wrong in saying "generally" streaming works in HDFS and cassandra
>> . Streaming typically works with streaming or queing source like Kafka,
>> kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 )
>> However , streaming context ( "receiver" wishing the streaming context )
>> gets events/messages/records and forms a time window based batch (RDD)-
>>
>> So there is a maximum gap of window time from alert message was available
>> to spark and when the processing happens. I think you meant about this.
>>
>> As per spark programming model, RDD is the right way to deal with data.
>> If you are fine with the minimum delay of say a sec (based on min time
>> window that dstreaming can support) then what Rohit gave is a right model.
>>
>> Khanderao
>>
>> On Mar 22, 2015, at 11:39 PM, Akhil Das <ak...@sigmoidanalytics.com>
>> wrote:
>>
>> What do you mean you can't send it directly from spark workers? Here's a
>> simple approach which you could do:
>>
>>     val data = ssc.textFileStream("sigmoid/")
>>     val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd =>
>> alert("Errors :" + rdd.count()))
>>
>> And the alert() function could be anything triggering an email or sending
>> an SMS alert.
>>
>> Thanks
>> Best Regards
>>
>> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mo...@gmail.com>
>> wrote:
>>
>>> Is there a module in spark streaming that lets you listen to
>>> the alerts/conditions as they happen in the streaming module? Generally
>>> spark streaming components will execute on large set of clusters like hdfs
>>> or Cassandra, however when it comes to alerting you generally can't send it
>>> directly from the spark workers, which means you need a way to listen to
>>> the alerts.
>>>
>>
>>
>

Re: Spark streaming alerting

Posted by Mohit Anchlia <mo...@gmail.com>.
I think I didn't explain myself properly :) What I meant to say was that
generally spark worker runs on either on HDFS's data nodes or on Cassandra
nodes, which typically is in a private network (protected). When a
condition is matched it's difficult to send out the alerts directly from
the worker nodes because of the security concerns. I was wondering if there
is a way to listen on the events as they occur on the sliding window scale
or is the best way to accomplish is to post back to a queue?

On Mon, Mar 23, 2015 at 2:22 AM, Khanderao Kand Gmail <
khanderao.kand@gmail.com> wrote:

> Akhil
>
> You are right in tour answer to what Mohit wrote. However what Mohit seems
> to be alluring but did not write properly might be different.
>
> Mohit
>
> You are wrong in saying "generally" streaming works in HDFS and cassandra
> . Streaming typically works with streaming or queing source like Kafka,
> kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 )
> However , streaming context ( "receiver" wishing the streaming context )
> gets events/messages/records and forms a time window based batch (RDD)-
>
> So there is a maximum gap of window time from alert message was available
> to spark and when the processing happens. I think you meant about this.
>
> As per spark programming model, RDD is the right way to deal with data.
> If you are fine with the minimum delay of say a sec (based on min time
> window that dstreaming can support) then what Rohit gave is a right model.
>
> Khanderao
>
> On Mar 22, 2015, at 11:39 PM, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
> What do you mean you can't send it directly from spark workers? Here's a
> simple approach which you could do:
>
>     val data = ssc.textFileStream("sigmoid/")
>     val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd =>
> alert("Errors :" + rdd.count()))
>
> And the alert() function could be anything triggering an email or sending
> an SMS alert.
>
> Thanks
> Best Regards
>
> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mo...@gmail.com>
> wrote:
>
>> Is there a module in spark streaming that lets you listen to
>> the alerts/conditions as they happen in the streaming module? Generally
>> spark streaming components will execute on large set of clusters like hdfs
>> or Cassandra, however when it comes to alerting you generally can't send it
>> directly from the spark workers, which means you need a way to listen to
>> the alerts.
>>
>
>

Re: Spark streaming alerting

Posted by Khanderao Kand Gmail <kh...@gmail.com>.
Akhil 

You are right in tour answer to what Mohit wrote. However what Mohit seems to be alluring but did not write properly might be different.

Mohit

You are wrong in saying "generally" streaming works in HDFS and cassandra . Streaming typically works with streaming or queing source like Kafka, kinesis, Twitter, flume, zeroMQ, etc (but can also from HDFS and S3 ) However , streaming context ( "receiver" wishing the streaming context ) gets events/messages/records and forms a time window based batch (RDD)- 

So there is a maximum gap of window time from alert message was available to spark and when the processing happens. I think you meant about this. 

As per spark programming model, RDD is the right way to deal with data.  If you are fine with the minimum delay of say a sec (based on min time window that dstreaming can support) then what Rohit gave is a right model. 

Khanderao

> On Mar 22, 2015, at 11:39 PM, Akhil Das <ak...@sigmoidanalytics.com> wrote:
> 
> What do you mean you can't send it directly from spark workers? Here's a simple approach which you could do:
> 
>     val data = ssc.textFileStream("sigmoid/")
>     val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd => alert("Errors :" + rdd.count()))
> 
> And the alert() function could be anything triggering an email or sending an SMS alert.
> 
> Thanks
> Best Regards
> 
>> On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mo...@gmail.com> wrote:
>> Is there a module in spark streaming that lets you listen to the alerts/conditions as they happen in the streaming module? Generally spark streaming components will execute on large set of clusters like hdfs or Cassandra, however when it comes to alerting you generally can't send it directly from the spark workers, which means you need a way to listen to the alerts.
> 

Re: Spark streaming alerting

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
What do you mean you can't send it directly from spark workers? Here's a
simple approach which you could do:

    val data = ssc.textFileStream("sigmoid/")
    val dist = data.filter(_.contains("ERROR")).foreachRDD(rdd =>
alert("Errors :" + rdd.count()))

And the alert() function could be anything triggering an email or sending
an SMS alert.

Thanks
Best Regards

On Sun, Mar 22, 2015 at 1:52 AM, Mohit Anchlia <mo...@gmail.com>
wrote:

> Is there a module in spark streaming that lets you listen to
> the alerts/conditions as they happen in the streaming module? Generally
> spark streaming components will execute on large set of clusters like hdfs
> or Cassandra, however when it comes to alerting you generally can't send it
> directly from the spark workers, which means you need a way to listen to
> the alerts.
>