You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Cassa L <lc...@gmail.com> on 2016/01/12 00:09:06 UTC

Regarding sliding window example from Databricks for DStream

Hi,
 I'm trying to work with sliding window example given by databricks.
https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/windows.html

It works fine as expected.
My question is how do I determine when the last phase of of slider has
reached. I want to perform final operation and notify other system when end
of the slider has reched to the window duarions. e.g. in below example
from databricks,


JavaDStream<ApacheAccessLog> windowDStream =
    accessLogDStream.window(WINDOW_LENGTH, SLIDE_INTERVAL);
windowDStream.foreachRDD(accessLogs -> {
  if (accessLogs.count() == 0) {
    System.out.println("No access logs in this time interval");
    return null;
  }

  // Insert code verbatim from LogAnalyzer.java or LogAnalyzerSQL.java here.

  // Calculate statistics based on the content size.
  JavaRDD<Long> contentSizes =
      accessLogs.map(ApacheAccessLog::getContentSize).cache();
  System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s",
      contentSizes.reduce(SUM_REDUCER) / contentSizes.count(),
      contentSizes.min(Comparator.naturalOrder()),
      contentSizes.max(Comparator.naturalOrder())));

   //.....
}

I want to check if entire average at the end of window falls below
certain value and send alert. How do I get this?


Thanks,
LCassa

Re: Regarding sliding window example from Databricks for DStream

Posted by Cassa L <lc...@gmail.com>.
Any thoughts over this? I want to know when  window duration is complete
and not the sliding window.  Is there a way I can catch end of Window
Duration or do I need to keep track of it and how?

LCassa

On Mon, Jan 11, 2016 at 3:09 PM, Cassa L <lc...@gmail.com> wrote:

> Hi,
>  I'm trying to work with sliding window example given by databricks.
>
> https://databricks.gitbooks.io/databricks-spark-reference-applications/content/logs_analyzer/chapter1/windows.html
>
> It works fine as expected.
> My question is how do I determine when the last phase of of slider has
> reached. I want to perform final operation and notify other system when end
> of the slider has reched to the window duarions. e.g. in below example
> from databricks,
>
>
> JavaDStream<ApacheAccessLog> windowDStream =
>     accessLogDStream.window(WINDOW_LENGTH, SLIDE_INTERVAL);
> windowDStream.foreachRDD(accessLogs -> {
>   if (accessLogs.count() == 0) {
>     System.out.println("No access logs in this time interval");
>     return null;
>   }
>
>   // Insert code verbatim from LogAnalyzer.java or LogAnalyzerSQL.java here.
>
>   // Calculate statistics based on the content size.
>   JavaRDD<Long> contentSizes =
>       accessLogs.map(ApacheAccessLog::getContentSize).cache();
>   System.out.println(String.format("Content Size Avg: %s, Min: %s, Max: %s",
>       contentSizes.reduce(SUM_REDUCER) / contentSizes.count(),
>       contentSizes.min(Comparator.naturalOrder()),
>       contentSizes.max(Comparator.naturalOrder())));
>
>    //.....
> }
>
> I want to check if entire average at the end of window falls below certain value and send alert. How do I get this?
>
>
> Thanks,
> LCassa
>
>