You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Harut <ha...@gmail.com> on 2015/03/20 08:43:43 UTC

Visualizing Spark Streaming data

I'm trying to build a dashboard to visualize stream of events coming from
mobile devices. 
For example, I have event called add_photo, from which I want to calculate
trending tags for added photos for last x minutes. Then I'd like to
aggregate that by country, etc. I've built the streaming part, which reads
from Kafka, and calculates needed results and get appropriate RDDs, the
question is now how to connect it to UI. 

Is there any general practices on how to pass parameters to spark from some
custom built UI, how to organize data retrieval, what intermediate storages
to use, etc.

Thanks in advance.




--
View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
For additional commands, e-mail: user-help@spark.apache.org


Re: Visualizing Spark Streaming data

Posted by Jeffrey Jedele <je...@gmail.com>.
I'll stay with my recommendation - that's exactly what Kibana is made for ;)

2015-03-20 9:06 GMT+01:00 Harut Martirosyan <ha...@gmail.com>:

> Hey Jeffrey.
> Thanks for reply.
>
> I already have something similar, I use Grafana and Graphite, and for
> simple metric streaming we've got all set-up right.
>
> My question is about interactive patterns. For instance, dynamically
> choose an event to monitor, dynamically choose group-by field or any sort
> of filter, then view results. This is easy when you have 1 user, but if you
> have team of analysts all specifying their own criteria, it becomes hard to
> manage them all.
>
> On 20 March 2015 at 12:02, Jeffrey Jedele <je...@gmail.com>
> wrote:
>
>> Hey Harut,
>> I don't think there'll by any general practices as this part heavily
>> depends on your environment, skills and what you want to achieve.
>>
>> If you don't have a general direction yet, I'd suggest you to have a look
>> at Elasticsearch+Kibana. It's very easy to set up, powerful and therefore
>> gets a lot of traction currently.
>>
>> Regards,
>> Jeff
>>
>> 2015-03-20 8:43 GMT+01:00 Harut <ha...@gmail.com>:
>>
>>> I'm trying to build a dashboard to visualize stream of events coming from
>>> mobile devices.
>>> For example, I have event called add_photo, from which I want to
>>> calculate
>>> trending tags for added photos for last x minutes. Then I'd like to
>>> aggregate that by country, etc. I've built the streaming part, which
>>> reads
>>> from Kafka, and calculates needed results and get appropriate RDDs, the
>>> question is now how to connect it to UI.
>>>
>>> Is there any general practices on how to pass parameters to spark from
>>> some
>>> custom built UI, how to organize data retrieval, what intermediate
>>> storages
>>> to use, etc.
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> RGRDZ Harut
>

Re: Visualizing Spark Streaming data

Posted by Roger Hoover <ro...@gmail.com>.
Hi Harut,

Jeff's right that Kibana + Elasticsearch can take you quite far out of the
box.  Depending on your volume of data, you may only be able to keep recent
data around though.

Another option that is custom-built for handling many dimensions at query
time (not as separate metrics) is Druid (http://druid.io/).  It supports
the Lambda architecture.  It does real-time indexing from Kafka and after a
configurable window, hands off shards to historical nodes.  The historical
shards can also be recomputed in batch mode to fixed up duplicates or late
data.

I wrote a plugin for Grafana that talks to Druid.  It doesn't support all
of Druid's rich query API but it can get you pretty far.

https://github.com/Quantiply/grafana-plugins/

Cheers,

Roger



On Fri, Mar 20, 2015 at 9:11 AM, Harut Martirosyan <
harut.martirosyan@gmail.com> wrote:

> But it requires all possible combinations of your filters as separate
> metrics, moreover, it only can show time based information, you cannot
> group by say country.
>
> On 20 March 2015 at 19:09, Irfan Ahmad <ir...@cloudphysics.com> wrote:
>
>> Grafana allows pretty slick interactive use patterns, especially with
>> graphite as the back-end. In a multi-user environment, why not have each
>> user just build their own independent dashboards and name them under some
>> simple naming convention?
>>
>>
>> *Irfan Ahmad*
>> CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
>> Best of VMworld Finalist
>> Best Cloud Management Award
>> NetworkWorld 10 Startups to Watch
>> EMA Most Notable Vendor
>>
>> On Fri, Mar 20, 2015 at 1:06 AM, Harut Martirosyan <
>> harut.martirosyan@gmail.com> wrote:
>>
>>> Hey Jeffrey.
>>> Thanks for reply.
>>>
>>> I already have something similar, I use Grafana and Graphite, and for
>>> simple metric streaming we've got all set-up right.
>>>
>>> My question is about interactive patterns. For instance, dynamically
>>> choose an event to monitor, dynamically choose group-by field or any sort
>>> of filter, then view results. This is easy when you have 1 user, but if you
>>> have team of analysts all specifying their own criteria, it becomes hard to
>>> manage them all.
>>>
>>> On 20 March 2015 at 12:02, Jeffrey Jedele <je...@gmail.com>
>>> wrote:
>>>
>>>> Hey Harut,
>>>> I don't think there'll by any general practices as this part heavily
>>>> depends on your environment, skills and what you want to achieve.
>>>>
>>>> If you don't have a general direction yet, I'd suggest you to have a
>>>> look at Elasticsearch+Kibana. It's very easy to set up, powerful and
>>>> therefore gets a lot of traction currently.
>>>>
>>>> Regards,
>>>> Jeff
>>>>
>>>> 2015-03-20 8:43 GMT+01:00 Harut <ha...@gmail.com>:
>>>>
>>>>> I'm trying to build a dashboard to visualize stream of events coming
>>>>> from
>>>>> mobile devices.
>>>>> For example, I have event called add_photo, from which I want to
>>>>> calculate
>>>>> trending tags for added photos for last x minutes. Then I'd like to
>>>>> aggregate that by country, etc. I've built the streaming part, which
>>>>> reads
>>>>> from Kafka, and calculates needed results and get appropriate RDDs, the
>>>>> question is now how to connect it to UI.
>>>>>
>>>>> Is there any general practices on how to pass parameters to spark from
>>>>> some
>>>>> custom built UI, how to organize data retrieval, what intermediate
>>>>> storages
>>>>> to use, etc.
>>>>>
>>>>> Thanks in advance.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html
>>>>> Sent from the Apache Spark User List mailing list archive at
>>>>> Nabble.com.
>>>>>
>>>>> ---------------------------------------------------------------------
>>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> RGRDZ Harut
>>>
>>
>>
>
>
> --
> RGRDZ Harut
>

Re: Visualizing Spark Streaming data

Posted by Harut Martirosyan <ha...@gmail.com>.
But it requires all possible combinations of your filters as separate
metrics, moreover, it only can show time based information, you cannot
group by say country.

On 20 March 2015 at 19:09, Irfan Ahmad <ir...@cloudphysics.com> wrote:

> Grafana allows pretty slick interactive use patterns, especially with
> graphite as the back-end. In a multi-user environment, why not have each
> user just build their own independent dashboards and name them under some
> simple naming convention?
>
>
> *Irfan Ahmad*
> CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
> Best of VMworld Finalist
> Best Cloud Management Award
> NetworkWorld 10 Startups to Watch
> EMA Most Notable Vendor
>
> On Fri, Mar 20, 2015 at 1:06 AM, Harut Martirosyan <
> harut.martirosyan@gmail.com> wrote:
>
>> Hey Jeffrey.
>> Thanks for reply.
>>
>> I already have something similar, I use Grafana and Graphite, and for
>> simple metric streaming we've got all set-up right.
>>
>> My question is about interactive patterns. For instance, dynamically
>> choose an event to monitor, dynamically choose group-by field or any sort
>> of filter, then view results. This is easy when you have 1 user, but if you
>> have team of analysts all specifying their own criteria, it becomes hard to
>> manage them all.
>>
>> On 20 March 2015 at 12:02, Jeffrey Jedele <je...@gmail.com>
>> wrote:
>>
>>> Hey Harut,
>>> I don't think there'll by any general practices as this part heavily
>>> depends on your environment, skills and what you want to achieve.
>>>
>>> If you don't have a general direction yet, I'd suggest you to have a
>>> look at Elasticsearch+Kibana. It's very easy to set up, powerful and
>>> therefore gets a lot of traction currently.
>>>
>>> Regards,
>>> Jeff
>>>
>>> 2015-03-20 8:43 GMT+01:00 Harut <ha...@gmail.com>:
>>>
>>>> I'm trying to build a dashboard to visualize stream of events coming
>>>> from
>>>> mobile devices.
>>>> For example, I have event called add_photo, from which I want to
>>>> calculate
>>>> trending tags for added photos for last x minutes. Then I'd like to
>>>> aggregate that by country, etc. I've built the streaming part, which
>>>> reads
>>>> from Kafka, and calculates needed results and get appropriate RDDs, the
>>>> question is now how to connect it to UI.
>>>>
>>>> Is there any general practices on how to pass parameters to spark from
>>>> some
>>>> custom built UI, how to organize data retrieval, what intermediate
>>>> storages
>>>> to use, etc.
>>>>
>>>> Thanks in advance.
>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html
>>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>>> For additional commands, e-mail: user-help@spark.apache.org
>>>>
>>>>
>>>
>>
>>
>> --
>> RGRDZ Harut
>>
>
>


-- 
RGRDZ Harut

Re: Visualizing Spark Streaming data

Posted by Irfan Ahmad <ir...@cloudphysics.com>.
Grafana allows pretty slick interactive use patterns, especially with
graphite as the back-end. In a multi-user environment, why not have each
user just build their own independent dashboards and name them under some
simple naming convention?


*Irfan Ahmad*
CTO | Co-Founder | *CloudPhysics* <http://www.cloudphysics.com>
Best of VMworld Finalist
Best Cloud Management Award
NetworkWorld 10 Startups to Watch
EMA Most Notable Vendor

On Fri, Mar 20, 2015 at 1:06 AM, Harut Martirosyan <
harut.martirosyan@gmail.com> wrote:

> Hey Jeffrey.
> Thanks for reply.
>
> I already have something similar, I use Grafana and Graphite, and for
> simple metric streaming we've got all set-up right.
>
> My question is about interactive patterns. For instance, dynamically
> choose an event to monitor, dynamically choose group-by field or any sort
> of filter, then view results. This is easy when you have 1 user, but if you
> have team of analysts all specifying their own criteria, it becomes hard to
> manage them all.
>
> On 20 March 2015 at 12:02, Jeffrey Jedele <je...@gmail.com>
> wrote:
>
>> Hey Harut,
>> I don't think there'll by any general practices as this part heavily
>> depends on your environment, skills and what you want to achieve.
>>
>> If you don't have a general direction yet, I'd suggest you to have a look
>> at Elasticsearch+Kibana. It's very easy to set up, powerful and therefore
>> gets a lot of traction currently.
>>
>> Regards,
>> Jeff
>>
>> 2015-03-20 8:43 GMT+01:00 Harut <ha...@gmail.com>:
>>
>>> I'm trying to build a dashboard to visualize stream of events coming from
>>> mobile devices.
>>> For example, I have event called add_photo, from which I want to
>>> calculate
>>> trending tags for added photos for last x minutes. Then I'd like to
>>> aggregate that by country, etc. I've built the streaming part, which
>>> reads
>>> from Kafka, and calculates needed results and get appropriate RDDs, the
>>> question is now how to connect it to UI.
>>>
>>> Is there any general practices on how to pass parameters to spark from
>>> some
>>> custom built UI, how to organize data retrieval, what intermediate
>>> storages
>>> to use, etc.
>>>
>>> Thanks in advance.
>>>
>>>
>>>
>>>
>>> --
>>> View this message in context:
>>> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html
>>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>>> For additional commands, e-mail: user-help@spark.apache.org
>>>
>>>
>>
>
>
> --
> RGRDZ Harut
>

Re: Visualizing Spark Streaming data

Posted by Harut Martirosyan <ha...@gmail.com>.
Hey Jeffrey.
Thanks for reply.

I already have something similar, I use Grafana and Graphite, and for
simple metric streaming we've got all set-up right.

My question is about interactive patterns. For instance, dynamically choose
an event to monitor, dynamically choose group-by field or any sort of
filter, then view results. This is easy when you have 1 user, but if you
have team of analysts all specifying their own criteria, it becomes hard to
manage them all.

On 20 March 2015 at 12:02, Jeffrey Jedele <je...@gmail.com> wrote:

> Hey Harut,
> I don't think there'll by any general practices as this part heavily
> depends on your environment, skills and what you want to achieve.
>
> If you don't have a general direction yet, I'd suggest you to have a look
> at Elasticsearch+Kibana. It's very easy to set up, powerful and therefore
> gets a lot of traction currently.
>
> Regards,
> Jeff
>
> 2015-03-20 8:43 GMT+01:00 Harut <ha...@gmail.com>:
>
>> I'm trying to build a dashboard to visualize stream of events coming from
>> mobile devices.
>> For example, I have event called add_photo, from which I want to calculate
>> trending tags for added photos for last x minutes. Then I'd like to
>> aggregate that by country, etc. I've built the streaming part, which reads
>> from Kafka, and calculates needed results and get appropriate RDDs, the
>> question is now how to connect it to UI.
>>
>> Is there any general practices on how to pass parameters to spark from
>> some
>> custom built UI, how to organize data retrieval, what intermediate
>> storages
>> to use, etc.
>>
>> Thanks in advance.
>>
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
>> For additional commands, e-mail: user-help@spark.apache.org
>>
>>
>


-- 
RGRDZ Harut

Re: Visualizing Spark Streaming data

Posted by Jeffrey Jedele <je...@gmail.com>.
Hey Harut,
I don't think there'll by any general practices as this part heavily
depends on your environment, skills and what you want to achieve.

If you don't have a general direction yet, I'd suggest you to have a look
at Elasticsearch+Kibana. It's very easy to set up, powerful and therefore
gets a lot of traction currently.

Regards,
Jeff

2015-03-20 8:43 GMT+01:00 Harut <ha...@gmail.com>:

> I'm trying to build a dashboard to visualize stream of events coming from
> mobile devices.
> For example, I have event called add_photo, from which I want to calculate
> trending tags for added photos for last x minutes. Then I'd like to
> aggregate that by country, etc. I've built the streaming part, which reads
> from Kafka, and calculates needed results and get appropriate RDDs, the
> question is now how to connect it to UI.
>
> Is there any general practices on how to pass parameters to spark from some
> custom built UI, how to organize data retrieval, what intermediate storages
> to use, etc.
>
> Thanks in advance.
>
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Visualizing-Spark-Streaming-data-tp22160.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@spark.apache.org
> For additional commands, e-mail: user-help@spark.apache.org
>
>