You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Gordon Benjamin <go...@gmail.com> on 2014/11/24 12:04:40 UTC

Use case question

hi,

We are building an analytics dashboard. Data will be updated every 5
minutes for now and eventually every 1 minute, maybe more frequent. The
amount of data coming is not huge, per customer maybe 30 records per minute
although we could have 500 customers. Is streaming correct for this
I nstead of reading from multiple partitions for the incremental data?

Re: Use case question

Posted by Gordon Benjamin <go...@gmail.com>.
Great thanks

On Monday, November 24, 2014, Akhil Das <ak...@sigmoidanalytics.com> wrote:

> I'm not quiet sure if i understood you correctly, but here's the thing, if
> you use sparkstreaming, it is more likely to refresh your dashboard for
> each batch. So for every batch your dashboard will be updated with the new
> data. And yes, the end use won't feel anything while you do the
> coalesce/repartition and all but after that your dashboards will be
> refreshed with new data.
>
> Thanks
> Best Regards
>
> On Mon, Nov 24, 2014 at 4:54 PM, Gordon Benjamin <
> gordon.benjamin65@gmail.com
> <javascript:_e(%7B%7D,'cvml','gordon.benjamin65@gmail.com');>> wrote:
>
>> Thanks. Yes d3 ones. Just to clarify--we could take our current system,
>> which is incrementally adding partitions and overlay an Apache streaming
>> layer to ingest these partitions? Then nightly, we could coalesce these
>> partitions for example? I presume that while we are carrying out
>> a coalesce, the end user would not lose access to the underlying data? Let
>> me know of I'm off the mark here.
>>
>> On Monday, November 24, 2014, Akhil Das <akhil@sigmoidanalytics.com
>> <javascript:_e(%7B%7D,'cvml','akhil@sigmoidanalytics.com');>> wrote:
>>
>>> Streaming would be easy to implement, all you have to do is to create
>>> the stream, do some transformation (depends on your usecase) and finally
>>> write it to your dashboards backend. What kind of dashboards are you
>>> building? For d3.js based ones, you can have websocket and write the stream
>>> output to the socket, for qlikView/tableau based ones you can push the
>>> stream to database.
>>>
>>> Thanks
>>> Best Regards
>>>
>>> On Mon, Nov 24, 2014 at 4:34 PM, Gordon Benjamin <
>>> gordon.benjamin65@gmail.com> wrote:
>>>
>>>> hi,
>>>>
>>>> We are building an analytics dashboard. Data will be updated every 5
>>>> minutes for now and eventually every 1 minute, maybe more frequent. The
>>>> amount of data coming is not huge, per customer maybe 30 records per minute
>>>> although we could have 500 customers. Is streaming correct for this
>>>> I nstead of reading from multiple partitions for the incremental data?
>>>>
>>>
>>>
>

Re: Use case question

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
I'm not quiet sure if i understood you correctly, but here's the thing, if
you use sparkstreaming, it is more likely to refresh your dashboard for
each batch. So for every batch your dashboard will be updated with the new
data. And yes, the end use won't feel anything while you do the
coalesce/repartition and all but after that your dashboards will be
refreshed with new data.

Thanks
Best Regards

On Mon, Nov 24, 2014 at 4:54 PM, Gordon Benjamin <
gordon.benjamin65@gmail.com> wrote:

> Thanks. Yes d3 ones. Just to clarify--we could take our current system,
> which is incrementally adding partitions and overlay an Apache streaming
> layer to ingest these partitions? Then nightly, we could coalesce these
> partitions for example? I presume that while we are carrying out
> a coalesce, the end user would not lose access to the underlying data? Let
> me know of I'm off the mark here.
>
> On Monday, November 24, 2014, Akhil Das <ak...@sigmoidanalytics.com>
> wrote:
>
>> Streaming would be easy to implement, all you have to do is to create the
>> stream, do some transformation (depends on your usecase) and finally write
>> it to your dashboards backend. What kind of dashboards are you building?
>> For d3.js based ones, you can have websocket and write the stream output to
>> the socket, for qlikView/tableau based ones you can push the stream to
>> database.
>>
>> Thanks
>> Best Regards
>>
>> On Mon, Nov 24, 2014 at 4:34 PM, Gordon Benjamin <
>> gordon.benjamin65@gmail.com> wrote:
>>
>>> hi,
>>>
>>> We are building an analytics dashboard. Data will be updated every 5
>>> minutes for now and eventually every 1 minute, maybe more frequent. The
>>> amount of data coming is not huge, per customer maybe 30 records per minute
>>> although we could have 500 customers. Is streaming correct for this
>>> I nstead of reading from multiple partitions for the incremental data?
>>>
>>
>>

Re: Use case question

Posted by Gordon Benjamin <go...@gmail.com>.
Thanks. Yes d3 ones. Just to clarify--we could take our current system,
which is incrementally adding partitions and overlay an Apache streaming
layer to ingest these partitions? Then nightly, we could coalesce these
partitions for example? I presume that while we are carrying out
a coalesce, the end user would not lose access to the underlying data? Let
me know of I'm off the mark here.

On Monday, November 24, 2014, Akhil Das <ak...@sigmoidanalytics.com> wrote:

> Streaming would be easy to implement, all you have to do is to create the
> stream, do some transformation (depends on your usecase) and finally write
> it to your dashboards backend. What kind of dashboards are you building?
> For d3.js based ones, you can have websocket and write the stream output to
> the socket, for qlikView/tableau based ones you can push the stream to
> database.
>
> Thanks
> Best Regards
>
> On Mon, Nov 24, 2014 at 4:34 PM, Gordon Benjamin <
> gordon.benjamin65@gmail.com
> <javascript:_e(%7B%7D,'cvml','gordon.benjamin65@gmail.com');>> wrote:
>
>> hi,
>>
>> We are building an analytics dashboard. Data will be updated every 5
>> minutes for now and eventually every 1 minute, maybe more frequent. The
>> amount of data coming is not huge, per customer maybe 30 records per minute
>> although we could have 500 customers. Is streaming correct for this
>> I nstead of reading from multiple partitions for the incremental data?
>>
>
>

Re: Use case question

Posted by Akhil Das <ak...@sigmoidanalytics.com>.
Streaming would be easy to implement, all you have to do is to create the
stream, do some transformation (depends on your usecase) and finally write
it to your dashboards backend. What kind of dashboards are you building?
For d3.js based ones, you can have websocket and write the stream output to
the socket, for qlikView/tableau based ones you can push the stream to
database.

Thanks
Best Regards

On Mon, Nov 24, 2014 at 4:34 PM, Gordon Benjamin <
gordon.benjamin65@gmail.com> wrote:

> hi,
>
> We are building an analytics dashboard. Data will be updated every 5
> minutes for now and eventually every 1 minute, maybe more frequent. The
> amount of data coming is not huge, per customer maybe 30 records per minute
> although we could have 500 customers. Is streaming correct for this
> I nstead of reading from multiple partitions for the incremental data?
>