You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Laeeq Ahmed <la...@yahoo.com> on 2014/07/01 11:51:41 UTC

Window Size

Hi,

The window size in a spark streaming is time based which means we have 
different number of elements in each window. For example if you have two streams (might be more) which are related to each other and you want to compare them in a specific time interval. I am not clear how it will 
work. Although they start running simultaneously, they might have 
different number of elements in each time interval.

The following is output for two streams which have same number of elements 
and ran simultaneously. The left most value is the number of elements in each window. If we add the number of elements them, they are same for 
both streams but we can't compare both streams as they are different in 
window size and number of windows.

Can we somehow make windows based on real time values for both streams? or Can we make windows based on number of elements?

(n, (mean, varience, SD))

Stream 1

(7462,(1.0535658165371238,4242.001306434091,65.13064798107025))
(44826,(0.2546925855084064,5042.890184382894,71.0133099100647))
(245466,(0.2857731601728941,5014.411691661449,70.81251084138628))
(154852,(0.21907814309792514,3483.800160602281,59.023725404300606))
(156345,(0.3075668844414613,7449.528181550462,86.31064929399189))
(156603,(0.27785151491351234,5917.809892281489,76.9273026452994))
(156047,(0.18130350363672296,4019.0232843737017,63.39576708561623))


Stream 2

(10493,(0.5554953964547791,1254.883548218503,35.42433553672536))
(180649,(0.21684831234050583,1095.9634245399352,33.1053383087975))
(179994,(0.22048869512317407,1443.0566458182718,37.98758541705792))
(179455,(0.20473330254938552,1623.9538730448216,40.29831104456888))
(269817,(0.16987953223480945,3270.663944782799,57.18971887308766))
(101193,(0.21469292497504766,1263.0879032808723,35.53994799209577))


Regards,Laeeq