You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Johannes Hugo Kitschke <jo...@rwth-aachen.de> on 2014/11/10 12:26:46 UTC

Sliding window on numerical data?

Hi there,

I want to solve a task and wonder if Storm is suitable for this. My task:

     - input (many) numerical datastreams (e.g. stock market data, 
seismic data, ...)
     - define (many) queries of length N (may be different for each query)
     - compute distance (e.g. using dynamic time warping) between each 
query and the last N datapoints of each datastream
     - report matches if distance < eps

This is a rough outline. I am completely new to Storm and I wonder, if 
Storm is a good candidate to solve this problem (if not, alternatives?). 
The main reason I write this mail is, because I have to access the last 
N elements of each stream for each new arriving element. But every 
example I found does only do some computation on one element at a time.

While writing this, I came across the**'RollingCountBolt'. Does this 
implement the functionality I want? Does this mean each Bolt doing the 
distance computation for the last N points has to store these points?

I would be interested in your thoughts, hints and idea**s.

Thanks!
  Johannes

Re: Sliding window on numerical data?

Posted by Manoj Jaiswal <ma...@gmail.com>.
Johannes,


Esper has several concepts like rolling and sliding windows.
Whatever you are trying to achieve through storm by using transactions etc,
can be achieved easily in Esper.
In our bolt,  we initialize the queries on receiving one type of message,
then the real time data is used in sliding and different type of windows to
give outout to a listener.
Esper can help calculate certain statistical functions as well. It can
query on previous window.
I will suggest you look through Esper use cases to check if can meet your
need.

-Manoj



On Tue, Nov 11, 2014 at 6:20 AM, Klausen Schaefersinho <
klaus.schaefers@gmail.com> wrote:

> Hi,
>
> I guess in an production scenario your system should be able to monitor a
> lot of sensors, right? So your event should also have an sensor id and
> partition the stream after the spoud by the sensor id.  In case you don#t
> need I would not split the window saving and the distance in different
> bolts unless really needed. If you just do some in memory updates your
> system should be pretty fast. If you think its better to have very focused
> bolts, I could imagine the following topology will do a good job.
>
>
> [Spout] -> {fieldGrouping("sensor")} -> [WindowBolt] ->
> {fieldGrouping("sensor")} -> [DistanceBolt]
>
> The window bolt would emit the stream window , with the sensor id
> included. The distance bolt would have to hold for each sensor_id the last
> window and compare it to the current window.
>
> Cheers
>
>
> On Tue, Nov 11, 2014 at 3:06 PM, Kitschke, Johannes Hugo <
> johannes.kitschke@rwth-aachen.de> wrote:
>
>>  Hi,
>>
>>  yes this is a research type question. I already implemented a prototype
>> framework in python with a similar flowing topology as in Storm and now I
>> want to do this in parallel on a greater scale.
>>
>>  I have another question concerning stream grouping: Since in my case a
>> distance-computing-bolt has to keep the tuples for the last n seconds, I
>> cannot use different tasks for this bolt (!?). What I would need is, that
>> the tuples are replicated for each bolt, but the actual work should be
>> distributed across the tasks.
>> This does not mean that I need this (I don't think I will), but I just
>> want to understand what's possible.
>>
>>  Best,
>>  Johannes
>>
>>  ------------------------------
>> *From:* Klausen Schaefersinho [klaus.schaefers@gmail.com]
>> *Sent:* Tuesday, November 11, 2014 10:32 AM
>> *To:* user@storm.apache.org
>> *Subject:* Re: Sliding window on numerical data?
>>
>>   Hi,
>>
>>  IBM InfoSphere Streams has some kind of visual query editor. However I
>> doubt that you can model a heart beat in it. If you need to start quick
>> just write your own bolt and use some kind of ring-buffer to store the last
>> n events. Then writer a function that converts it to a vector. Please
>> remember that the event frequency is not nessecay the sampling frequency of
>> your sensor. So your events should have a time stamp from the sensor and
>> you should build the vector using these time stamps, not the position in
>> the ring-buffer.
>>
>>  Do you know already that Dynamic Time Warping will work? Your tasks
>> sounds pretty much like research, so until you know what model and
>> representation you want to use it might also be the best to first conduct
>> some experiments in R or SciPi and build the windows from a sensor log file.
>>
>>  Cheers,
>>
>>  Klaus
>>
>>
>>
>> On Tue, Nov 11, 2014 at 10:07 AM, Johannes Hugo Kitschke <
>> johannes.kitschke@rwth-aachen.de> wrote:
>>
>>>  Hi,
>>>
>>> thanks for your repsonse. I had a look at both Esper and Siddhi. Please
>>> correct me if I'm wrong, but I don't think these fit my purpose: Both use a
>>> SQL-Like query language, right?
>>> What I am aiming at, is to visually define a query (consider for example
>>> the heartbeat [1]) and then compute the distance between the query and some
>>> datastream (for each new datapoint). But I cannot define such a complex
>>> query using this SQL-like query language!? Or did I miss something?
>>>
>>> Johannes
>>>
>>> [1] http://en.wikipedia.org/wiki/Electrocardiography#Waves_and_intervals
>>>
>>> On 11/11/2014 08:16 AM, Sajith wrote:
>>>
>>>  Hi Johannes,
>>>
>>>  You can also use Sidddhi complex event processing engine to achieve
>>> your tasks. [1] is a bolt I wrote using Siddhi.
>>>
>>> [1]
>>> https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java
>>>
>>> On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal <
>>> manoj.jaiswal21@gmail.com> wrote:
>>>
>>>>   Hi Johannes,
>>>>
>>>>  We are doing something similar using Esper in storm.
>>>>  The queries are set in EsperBolt and realtime data is processed
>>>> through that.
>>>>
>>>>  -Manoj
>>>>
>>>> On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho <
>>>> klaus.schaefers@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>   > The main reason I write this mail is, because I have to access
>>>>> the last N elements of each stream for each new arriving element
>>>>>
>>>>>  You can not go back in a stream, so you have to write your own bolt
>>>>> that stores the last windows. That should be pretty straight forward.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <
>>>>> johannes.kitschke@rwth-aachen.de> wrote:
>>>>>
>>>>>> Hi there,
>>>>>>
>>>>>> I want to solve a task and wonder if Storm is suitable for this. My
>>>>>> task:
>>>>>>
>>>>>>     - input (many) numerical datastreams (e.g. stock market data,
>>>>>> seismic data, ...)
>>>>>>     - define (many) queries of length N (may be different for each
>>>>>> query)
>>>>>>     - compute distance (e.g. using dynamic time warping) between each
>>>>>> query and the last N datapoints of each datastream
>>>>>>     - report matches if distance < eps
>>>>>>
>>>>>> This is a rough outline. I am completely new to Storm and I wonder,
>>>>>> if Storm is a good candidate to solve this problem (if not, alternatives?).
>>>>>> The main reason I write this mail is, because I have to access the last N
>>>>>> elements of each stream for each new arriving element. But every example I
>>>>>> found does only do some computation on one element at a time.
>>>>>>
>>>>>> While writing this, I came across the 'RollingCountBolt'. Does this
>>>>>> implement the functionality I want? Does this mean each Bolt doing the
>>>>>> distance computation for the last N points has to store these points?
>>>>>>
>>>>>> I would be interested in your thoughts, hints and ideas.
>>>>>>
>>>>>> Thanks!
>>>>>>  Johannes
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Sliding window on numerical data?

Posted by Klausen Schaefersinho <kl...@gmail.com>.
Hi,

I guess in an production scenario your system should be able to monitor a
lot of sensors, right? So your event should also have an sensor id and
partition the stream after the spoud by the sensor id.  In case you don#t
need I would not split the window saving and the distance in different
bolts unless really needed. If you just do some in memory updates your
system should be pretty fast. If you think its better to have very focused
bolts, I could imagine the following topology will do a good job.


[Spout] -> {fieldGrouping("sensor")} -> [WindowBolt] ->
{fieldGrouping("sensor")} -> [DistanceBolt]

The window bolt would emit the stream window , with the sensor id included.
The distance bolt would have to hold for each sensor_id the last window and
compare it to the current window.

Cheers


On Tue, Nov 11, 2014 at 3:06 PM, Kitschke, Johannes Hugo <
johannes.kitschke@rwth-aachen.de> wrote:

>  Hi,
>
>  yes this is a research type question. I already implemented a prototype
> framework in python with a similar flowing topology as in Storm and now I
> want to do this in parallel on a greater scale.
>
>  I have another question concerning stream grouping: Since in my case a
> distance-computing-bolt has to keep the tuples for the last n seconds, I
> cannot use different tasks for this bolt (!?). What I would need is, that
> the tuples are replicated for each bolt, but the actual work should be
> distributed across the tasks.
> This does not mean that I need this (I don't think I will), but I just
> want to understand what's possible.
>
>  Best,
>  Johannes
>
>  ------------------------------
> *From:* Klausen Schaefersinho [klaus.schaefers@gmail.com]
> *Sent:* Tuesday, November 11, 2014 10:32 AM
> *To:* user@storm.apache.org
> *Subject:* Re: Sliding window on numerical data?
>
>   Hi,
>
>  IBM InfoSphere Streams has some kind of visual query editor. However I
> doubt that you can model a heart beat in it. If you need to start quick
> just write your own bolt and use some kind of ring-buffer to store the last
> n events. Then writer a function that converts it to a vector. Please
> remember that the event frequency is not nessecay the sampling frequency of
> your sensor. So your events should have a time stamp from the sensor and
> you should build the vector using these time stamps, not the position in
> the ring-buffer.
>
>  Do you know already that Dynamic Time Warping will work? Your tasks
> sounds pretty much like research, so until you know what model and
> representation you want to use it might also be the best to first conduct
> some experiments in R or SciPi and build the windows from a sensor log file.
>
>  Cheers,
>
>  Klaus
>
>
>
> On Tue, Nov 11, 2014 at 10:07 AM, Johannes Hugo Kitschke <
> johannes.kitschke@rwth-aachen.de> wrote:
>
>>  Hi,
>>
>> thanks for your repsonse. I had a look at both Esper and Siddhi. Please
>> correct me if I'm wrong, but I don't think these fit my purpose: Both use a
>> SQL-Like query language, right?
>> What I am aiming at, is to visually define a query (consider for example
>> the heartbeat [1]) and then compute the distance between the query and some
>> datastream (for each new datapoint). But I cannot define such a complex
>> query using this SQL-like query language!? Or did I miss something?
>>
>> Johannes
>>
>> [1] http://en.wikipedia.org/wiki/Electrocardiography#Waves_and_intervals
>>
>> On 11/11/2014 08:16 AM, Sajith wrote:
>>
>>  Hi Johannes,
>>
>>  You can also use Sidddhi complex event processing engine to achieve your
>> tasks. [1] is a bolt I wrote using Siddhi.
>>
>> [1]
>> https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java
>>
>> On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal <manoj.jaiswal21@gmail.com
>> > wrote:
>>
>>>   Hi Johannes,
>>>
>>>  We are doing something similar using Esper in storm.
>>>  The queries are set in EsperBolt and realtime data is processed
>>> through that.
>>>
>>>  -Manoj
>>>
>>> On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho <
>>> klaus.schaefers@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>>   > The main reason I write this mail is, because I have to access the
>>>> last N elements of each stream for each new arriving element
>>>>
>>>>  You can not go back in a stream, so you have to write your own bolt
>>>> that stores the last windows. That should be pretty straight forward.
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <
>>>> johannes.kitschke@rwth-aachen.de> wrote:
>>>>
>>>>> Hi there,
>>>>>
>>>>> I want to solve a task and wonder if Storm is suitable for this. My
>>>>> task:
>>>>>
>>>>>     - input (many) numerical datastreams (e.g. stock market data,
>>>>> seismic data, ...)
>>>>>     - define (many) queries of length N (may be different for each
>>>>> query)
>>>>>     - compute distance (e.g. using dynamic time warping) between each
>>>>> query and the last N datapoints of each datastream
>>>>>     - report matches if distance < eps
>>>>>
>>>>> This is a rough outline. I am completely new to Storm and I wonder, if
>>>>> Storm is a good candidate to solve this problem (if not, alternatives?).
>>>>> The main reason I write this mail is, because I have to access the last N
>>>>> elements of each stream for each new arriving element. But every example I
>>>>> found does only do some computation on one element at a time.
>>>>>
>>>>> While writing this, I came across the 'RollingCountBolt'. Does this
>>>>> implement the functionality I want? Does this mean each Bolt doing the
>>>>> distance computation for the last N points has to store these points?
>>>>>
>>>>> I would be interested in your thoughts, hints and ideas.
>>>>>
>>>>> Thanks!
>>>>>  Johannes
>>>>>
>>>>
>>>>
>>>
>>
>

RE: Sliding window on numerical data?

Posted by "Kitschke, Johannes Hugo" <jo...@rwth-aachen.de>.
Hi,

yes this is a research type question. I already implemented a prototype framework in python with a similar flowing topology as in Storm and now I want to do this in parallel on a greater scale.

I have another question concerning stream grouping: Since in my case a distance-computing-bolt has to keep the tuples for the last n seconds, I cannot use different tasks for this bolt (!?). What I would need is, that the tuples are replicated for each bolt, but the actual work should be distributed across the tasks.
This does not mean that I need this (I don't think I will), but I just want to understand what's possible.

Best,
 Johannes

________________________________
From: Klausen Schaefersinho [klaus.schaefers@gmail.com]
Sent: Tuesday, November 11, 2014 10:32 AM
To: user@storm.apache.org
Subject: Re: Sliding window on numerical data?

Hi,

IBM InfoSphere Streams has some kind of visual query editor. However I doubt that you can model a heart beat in it. If you need to start quick just write your own bolt and use some kind of ring-buffer to store the last n events. Then writer a function that converts it to a vector. Please remember that the event frequency is not nessecay the sampling frequency of your sensor. So your events should have a time stamp from the sensor and you should build the vector using these time stamps, not the position in the ring-buffer.

Do you know already that Dynamic Time Warping will work? Your tasks sounds pretty much like research, so until you know what model and representation you want to use it might also be the best to first conduct some experiments in R or SciPi and build the windows from a sensor log file.

Cheers,

Klaus



On Tue, Nov 11, 2014 at 10:07 AM, Johannes Hugo Kitschke <jo...@rwth-aachen.de>> wrote:
Hi,

thanks for your repsonse. I had a look at both Esper and Siddhi. Please correct me if I'm wrong, but I don't think these fit my purpose: Both use a SQL-Like query language, right?
What I am aiming at, is to visually define a query (consider for example the heartbeat [1]) and then compute the distance between the query and some datastream (for each new datapoint). But I cannot define such a complex query using this SQL-like query language!? Or did I miss something?

Johannes

[1] http://en.wikipedia.org/wiki/Electrocardiography#Waves_and_intervals

On 11/11/2014 08:16 AM, Sajith wrote:
Hi Johannes,

You can also use Sidddhi complex event processing engine to achieve your tasks. [1] is a bolt I wrote using Siddhi.

[1] https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java

On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal <ma...@gmail.com>> wrote:
Hi Johannes,

We are doing something similar using Esper in storm.
The queries are set in EsperBolt and realtime data is processed through that.

-Manoj

On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho <kl...@gmail.com>> wrote:
Hi,

 > The main reason I write this mail is, because I have to access the last N elements of each stream for each new arriving element

You can not go back in a stream, so you have to write your own bolt that stores the last windows. That should be pretty straight forward.




On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <jo...@rwth-aachen.de>> wrote:
Hi there,

I want to solve a task and wonder if Storm is suitable for this. My task:

    - input (many) numerical datastreams (e.g. stock market data, seismic data, ...)
    - define (many) queries of length N (may be different for each query)
    - compute distance (e.g. using dynamic time warping) between each query and the last N datapoints of each datastream
    - report matches if distance < eps

This is a rough outline. I am completely new to Storm and I wonder, if Storm is a good candidate to solve this problem (if not, alternatives?). The main reason I write this mail is, because I have to access the last N elements of each stream for each new arriving element. But every example I found does only do some computation on one element at a time.

While writing this, I came across the 'RollingCountBolt'. Does this implement the functionality I want? Does this mean each Bolt doing the distance computation for the last N points has to store these points?

I would be interested in your thoughts, hints and ideas.

Thanks!
 Johannes





Re: Sliding window on numerical data?

Posted by Klausen Schaefersinho <kl...@gmail.com>.
Hi,

IBM InfoSphere Streams has some kind of visual query editor. However I
doubt that you can model a heart beat in it. If you need to start quick
just write your own bolt and use some kind of ring-buffer to store the last
n events. Then writer a function that converts it to a vector. Please
remember that the event frequency is not nessecay the sampling frequency of
your sensor. So your events should have a time stamp from the sensor and
you should build the vector using these time stamps, not the position in
the ring-buffer.

Do you know already that Dynamic Time Warping will work? Your tasks sounds
pretty much like research, so until you know what model and representation
you want to use it might also be the best to first conduct some experiments
in R or SciPi and build the windows from a sensor log file.

Cheers,

Klaus



On Tue, Nov 11, 2014 at 10:07 AM, Johannes Hugo Kitschke <
johannes.kitschke@rwth-aachen.de> wrote:

>  Hi,
>
> thanks for your repsonse. I had a look at both Esper and Siddhi. Please
> correct me if I'm wrong, but I don't think these fit my purpose: Both use a
> SQL-Like query language, right?
> What I am aiming at, is to visually define a query (consider for example
> the heartbeat [1]) and then compute the distance between the query and some
> datastream (for each new datapoint). But I cannot define such a complex
> query using this SQL-like query language!? Or did I miss something?
>
> Johannes
>
> [1] http://en.wikipedia.org/wiki/Electrocardiography#Waves_and_intervals
>
> On 11/11/2014 08:16 AM, Sajith wrote:
>
>  Hi Johannes,
>
>  You can also use Sidddhi complex event processing engine to achieve your
> tasks. [1] is a bolt I wrote using Siddhi.
>
> [1]
> https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java
>
> On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal <ma...@gmail.com>
> wrote:
>
>>   Hi Johannes,
>>
>>  We are doing something similar using Esper in storm.
>>  The queries are set in EsperBolt and realtime data is processed through
>> that.
>>
>>  -Manoj
>>
>> On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho <
>> klaus.schaefers@gmail.com> wrote:
>>
>>> Hi,
>>>
>>>   > The main reason I write this mail is, because I have to access the
>>> last N elements of each stream for each new arriving element
>>>
>>>  You can not go back in a stream, so you have to write your own bolt
>>> that stores the last windows. That should be pretty straight forward.
>>>
>>>
>>>
>>>
>>> On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <
>>> johannes.kitschke@rwth-aachen.de> wrote:
>>>
>>>>  Hi there,
>>>>
>>>> I want to solve a task and wonder if Storm is suitable for this. My
>>>> task:
>>>>
>>>>     - input (many) numerical datastreams (e.g. stock market data,
>>>> seismic data, ...)
>>>>     - define (many) queries of length N (may be different for each
>>>> query)
>>>>     - compute distance (e.g. using dynamic time warping) between each
>>>> query and the last N datapoints of each datastream
>>>>     - report matches if distance < eps
>>>>
>>>> This is a rough outline. I am completely new to Storm and I wonder, if
>>>> Storm is a good candidate to solve this problem (if not, alternatives?).
>>>> The main reason I write this mail is, because I have to access the last N
>>>> elements of each stream for each new arriving element. But every example I
>>>> found does only do some computation on one element at a time.
>>>>
>>>> While writing this, I came across the 'RollingCountBolt'. Does this
>>>> implement the functionality I want? Does this mean each Bolt doing the
>>>> distance computation for the last N points has to store these points?
>>>>
>>>> I would be interested in your thoughts, hints and ideas.
>>>>
>>>> Thanks!
>>>>  Johannes
>>>>
>>>
>>>
>>
>

Re: Sliding window on numerical data?

Posted by Johannes Hugo Kitschke <jo...@rwth-aachen.de>.
Hi,

thanks for your repsonse. I had a look at both Esper and Siddhi. Please 
correct me if I'm wrong, but I don't think these fit my purpose: Both 
use a SQL-Like query language, right?
What I am aiming at, is to visually define a query (consider for example 
the heartbeat [1]) and then compute the distance between the query and 
some datastream (for each new datapoint). But I cannot define such a 
complex query using this SQL-like query language!? Or did I miss something?

Johannes

[1] http://en.wikipedia.org/wiki/Electrocardiography#Waves_and_intervals

On 11/11/2014 08:16 AM, Sajith wrote:
> Hi Johannes,
>
> You can also use Sidddhi complex event processing engine to achieve 
> your tasks. [1] is a bolt I wrote using Siddhi.
>
> [1] 
> https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java
>
> On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal 
> <manoj.jaiswal21@gmail.com <ma...@gmail.com>> wrote:
>
>     Hi Johannes,
>
>     We are doing something similar using Esper in storm.
>     The queries are set in EsperBolt and realtime data is processed
>     through that.
>
>     -Manoj
>
>     On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho
>     <klaus.schaefers@gmail.com <ma...@gmail.com>> wrote:
>
>         Hi,
>
>          > The main reason I write this mail is, because I have to
>         access the last N elements of each stream for each new
>         arriving element
>
>         You can not go back in a stream, so you have to write your own
>         bolt that stores the last windows. That should be pretty
>         straight forward.
>
>
>
>
>         On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke
>         <johannes.kitschke@rwth-aachen.de
>         <ma...@rwth-aachen.de>> wrote:
>
>             Hi there,
>
>             I want to solve a task and wonder if Storm is suitable for
>             this. My task:
>
>                 - input (many) numerical datastreams (e.g. stock
>             market data, seismic data, ...)
>                 - define (many) queries of length N (may be different
>             for each query)
>                 - compute distance (e.g. using dynamic time warping)
>             between each query and the last N datapoints of each
>             datastream
>                 - report matches if distance < eps
>
>             This is a rough outline. I am completely new to Storm and
>             I wonder, if Storm is a good candidate to solve this
>             problem (if not, alternatives?). The main reason I write
>             this mail is, because I have to access the last N elements
>             of each stream for each new arriving element. But every
>             example I found does only do some computation on one
>             element at a time.
>
>             While writing this, I came across the**'RollingCountBolt'.
>             Does this implement the functionality I want? Does this
>             mean each Bolt doing the distance computation for the last
>             N points has to store these points?
>
>             I would be interested in your thoughts, hints and idea**s.
>
>             Thanks!
>              Johannes
>
>
>
>

Re: Sliding window on numerical data?

Posted by Sajith <sa...@gmail.com>.
Hi Johannes,

You can also use Sidddhi complex event processing engine to achieve your
tasks. [1] is a bolt I wrote using Siddhi.

[1]
https://github.com/sajithshn/siddhi-storm/blob/master/src/main/java/org/wso2/siddhi/storm/component/SiddhiBolt.java

On Tue, Nov 11, 2014 at 1:29 AM, Manoj Jaiswal <ma...@gmail.com>
wrote:

> Hi Johannes,
>
> We are doing something similar using Esper in storm.
> The queries are set in EsperBolt and realtime data is processed through
> that.
>
> -Manoj
>
> On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho <
> klaus.schaefers@gmail.com> wrote:
>
>> Hi,
>>
>>  > The main reason I write this mail is, because I have to access the
>> last N elements of each stream for each new arriving element
>>
>> You can not go back in a stream, so you have to write your own bolt that
>> stores the last windows. That should be pretty straight forward.
>>
>>
>>
>>
>> On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <
>> johannes.kitschke@rwth-aachen.de> wrote:
>>
>>>  Hi there,
>>>
>>> I want to solve a task and wonder if Storm is suitable for this. My task:
>>>
>>>     - input (many) numerical datastreams (e.g. stock market data,
>>> seismic data, ...)
>>>     - define (many) queries of length N (may be different for each query)
>>>     - compute distance (e.g. using dynamic time warping) between each
>>> query and the last N datapoints of each datastream
>>>     - report matches if distance < eps
>>>
>>> This is a rough outline. I am completely new to Storm and I wonder, if
>>> Storm is a good candidate to solve this problem (if not, alternatives?).
>>> The main reason I write this mail is, because I have to access the last N
>>> elements of each stream for each new arriving element. But every example I
>>> found does only do some computation on one element at a time.
>>>
>>> While writing this, I came across the 'RollingCountBolt'. Does this
>>> implement the functionality I want? Does this mean each Bolt doing the
>>> distance computation for the last N points has to store these points?
>>>
>>> I would be interested in your thoughts, hints and ideas.
>>>
>>> Thanks!
>>>  Johannes
>>>
>>
>>
>

Re: Sliding window on numerical data?

Posted by Manoj Jaiswal <ma...@gmail.com>.
Hi Johannes,

We are doing something similar using Esper in storm.
The queries are set in EsperBolt and realtime data is processed through
that.

-Manoj

On Mon, Nov 10, 2014 at 3:54 AM, Klausen Schaefersinho <
klaus.schaefers@gmail.com> wrote:

> Hi,
>
>  > The main reason I write this mail is, because I have to access the
> last N elements of each stream for each new arriving element
>
> You can not go back in a stream, so you have to write your own bolt that
> stores the last windows. That should be pretty straight forward.
>
>
>
>
> On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <
> johannes.kitschke@rwth-aachen.de> wrote:
>
>>  Hi there,
>>
>> I want to solve a task and wonder if Storm is suitable for this. My task:
>>
>>     - input (many) numerical datastreams (e.g. stock market data, seismic
>> data, ...)
>>     - define (many) queries of length N (may be different for each query)
>>     - compute distance (e.g. using dynamic time warping) between each
>> query and the last N datapoints of each datastream
>>     - report matches if distance < eps
>>
>> This is a rough outline. I am completely new to Storm and I wonder, if
>> Storm is a good candidate to solve this problem (if not, alternatives?).
>> The main reason I write this mail is, because I have to access the last N
>> elements of each stream for each new arriving element. But every example I
>> found does only do some computation on one element at a time.
>>
>> While writing this, I came across the 'RollingCountBolt'. Does this
>> implement the functionality I want? Does this mean each Bolt doing the
>> distance computation for the last N points has to store these points?
>>
>> I would be interested in your thoughts, hints and ideas.
>>
>> Thanks!
>>  Johannes
>>
>
>

Re: Sliding window on numerical data?

Posted by Klausen Schaefersinho <kl...@gmail.com>.
Hi,

 > The main reason I write this mail is, because I have to access the last
N elements of each stream for each new arriving element

You can not go back in a stream, so you have to write your own bolt that
stores the last windows. That should be pretty straight forward.




On Mon, Nov 10, 2014 at 12:26 PM, Johannes Hugo Kitschke <
johannes.kitschke@rwth-aachen.de> wrote:

>  Hi there,
>
> I want to solve a task and wonder if Storm is suitable for this. My task:
>
>     - input (many) numerical datastreams (e.g. stock market data, seismic
> data, ...)
>     - define (many) queries of length N (may be different for each query)
>     - compute distance (e.g. using dynamic time warping) between each
> query and the last N datapoints of each datastream
>     - report matches if distance < eps
>
> This is a rough outline. I am completely new to Storm and I wonder, if
> Storm is a good candidate to solve this problem (if not, alternatives?).
> The main reason I write this mail is, because I have to access the last N
> elements of each stream for each new arriving element. But every example I
> found does only do some computation on one element at a time.
>
> While writing this, I came across the 'RollingCountBolt'. Does this
> implement the functionality I want? Does this mean each Bolt doing the
> distance computation for the last N points has to store these points?
>
> I would be interested in your thoughts, hints and ideas.
>
> Thanks!
>  Johannes
>