You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by cem <ca...@gmail.com> on 2014/02/17 17:48:36 UTC

1 day window size

Hi all,

I am new to the spark streaming and trying to evaluate it and I have couple
of questions.

1. Can setting window sand slide duration to 1 day cause any  problem? My
data size that will  fall to that interval is small.   Do you have other
suggestions ?

2. What is the best way to detect correlation? Suppose that I have 2
different events from the same source. I want to do an action when these 2
events happen in the same day. I thought about having a reducer.

Thanks in advance!

Best Regards,
Cem

Re: 1 day window size

Posted by Tathagata Das <ta...@gmail.com>.
1. I dont think we have tested window sizes that long.

2. If you have to keep track of a days worth of data, it may be better to
use an external systems that are more dedicated for lookups over massive
amounts of data (say, Cassandra). Use some unique key to push all the data
to Cassandra and then every every records, you can use Spark Streaming to
look up cassandra and see if it already exists or not. That can work.

TD



On Mon, Feb 17, 2014 at 8:48 AM, cem <ca...@gmail.com> wrote:

> Hi all,
>
> I am new to the spark streaming and trying to evaluate it and I have
> couple of questions.
>
> 1. Can setting window sand slide duration to 1 day cause any  problem? My
> data size that will  fall to that interval is small.   Do you have other
> suggestions ?
>
> 2. What is the best way to detect correlation? Suppose that I have 2
> different events from the same source. I want to do an action when these 2
> events happen in the same day. I thought about having a reducer.
>
> Thanks in advance!
>
> Best Regards,
> Cem
>