You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by cem <ca...@gmail.com> on 2014/02/17 17:48:36 UTC
1 day window size
Hi all,
I am new to the spark streaming and trying to evaluate it and I have couple
of questions.
1. Can setting window sand slide duration to 1 day cause any problem? My
data size that will fall to that interval is small. Do you have other
suggestions ?
2. What is the best way to detect correlation? Suppose that I have 2
different events from the same source. I want to do an action when these 2
events happen in the same day. I thought about having a reducer.
Thanks in advance!
Best Regards,
Cem
Re: 1 day window size
Posted by Tathagata Das <ta...@gmail.com>.
1. I dont think we have tested window sizes that long.
2. If you have to keep track of a days worth of data, it may be better to
use an external systems that are more dedicated for lookups over massive
amounts of data (say, Cassandra). Use some unique key to push all the data
to Cassandra and then every every records, you can use Spark Streaming to
look up cassandra and see if it already exists or not. That can work.
TD
On Mon, Feb 17, 2014 at 8:48 AM, cem <ca...@gmail.com> wrote:
> Hi all,
>
> I am new to the spark streaming and trying to evaluate it and I have
> couple of questions.
>
> 1. Can setting window sand slide duration to 1 day cause any problem? My
> data size that will fall to that interval is small. Do you have other
> suggestions ?
>
> 2. What is the best way to detect correlation? Suppose that I have 2
> different events from the same source. I want to do an action when these 2
> events happen in the same day. I thought about having a reducer.
>
> Thanks in advance!
>
> Best Regards,
> Cem
>