You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Jonathan Poon <jk...@ucdavis.edu> on 2014/06/07 00:01:29 UTC
Time Partitioning of Tuples
Hi Everyone,
I'm currently investigating different data processing tools for an
application I'm interested in. I have many sensors that I collect data
from. However, I would like to group the data from every sensor at
predefined time intervals and process it together.
Using Storm terminology, I would have each sensor send data to a spout.
The spouts would then send tuples to a specific bolt that will process all
of the data within a specific time partition. Each spout will tag each
event with a time id and each bolt will process data after collecting all
of the data with the same time id tags.
Is this possible with Storm?
I appreciate your help!
Jonathan
Re: Time Partitioning of Tuples
Posted by Jonathan Poon <jk...@ucdavis.edu>.
Hi Kyle,
I have a scientific application that has thousands of sensors spitting out
approximately 500,000 events per second. I'm looking for a tool that can
process these events in a real-time manner. In my application, I need to
read events from each of the sensors and create correlations between the
sensors based on the data. You described it, I pretty much need a
real-time batch processing tool.
For my application, the computation for each time slice should be much
longer than the time slice itself. I need a highly scalable tool that can
read all of the data, process the data from each time slice in parallel.
Jonathan
On Fri, Jun 6, 2014 at 5:04 PM, Jonathan Poon <jk...@ucdavis.edu> wrote:
> I will take a look into Trident as well. Thanks for the tip!
>
>
> On Fri, Jun 6, 2014 at 3:53 PM, Kyle Nusbaum <kn...@yahoo-inc.com>
> wrote:
>
>> Sounds interesting.
>>
>> I don't know much about your project, so I won't speculate about your
>> purposes.
>>
>> One thing to consider is that the duration of the computation on a time
>> slice must be longer than the time slice itself to really make this type of
>> setup worthwhile. Otherwise you could just feed the batches through the
>> same bolt, since it would be done processing a batch before the next one
>> comes in.
>>
>> -- Kyle
>>
>> On 06/06/2014 05:40 PM, Jonathan Poon wrote:
>>
>> Hi Kyle,
>>
>> I'm looking for a real-time batch processing tool. In my case, I'm
>> looking to make correlations between all of the sensors at each time
>> interval.
>>
>> I could use Hadoop (Map Reduce), but it requires I need to collect all
>> of the data before I can batch process each time partition of data from
>> each sensor.
>>
>> Another tool I'm also looking at is Spark Streaming, which allows me to
>> collect data at different time intervals and processing that batch of data
>> using Map Reduce
>>
>> However, Map Reduce seems inefficient because my sensor data is already
>> time sorted naturally. In addition, I would like real-time data on the fly.
>>
>> Seems like Storm might be a candidate for this application. Please let
>> me know what you think...! Thanks for your help!
>>
>> Jonathan
>>
>>
>>
>>
>> On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <kn...@yahoo-inc.com>
>> wrote:
>>
>>> You could send a signal tuple from the spout when it knows it's sent
>>> the last tuple for a time period, or include a field in the tuple for
>>> indicating it's the last member.
>>>
>>> I'm curious about why you want to do this, since the purpose of storm is
>>> to facilitate stream processing rather than the type of batch processing
>>> you're describing.
>>>
>>> -- Kyle
>>>
>>> On 06/06/2014 05:14 PM, Jonathan Poon wrote:
>>>
>>> Hi Nathan,
>>>
>>> The sensor data I have is naturally time sorted, since its just
>>> collecting data and emitting it to a spout. Is it possible for a bolt to
>>> know when all of the tuples with the same time tag have been collected and
>>> to start processing it together? Or is it only possible for a bolt to
>>> process each tuple one at a time?
>>>
>>> Thanks!
>>>
>>>
>>>
>>> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <nc...@gmail.com> wrote:
>>>
>>>> You can have your bolt subscribe to the spout using fields grouping and
>>>> use time tag as your key.
>>>> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu> wrote:
>>>>
>>>>> Hi Everyone,
>>>>>
>>>>> I'm currently investigating different data processing tools for an
>>>>> application I'm interested in. I have many sensors that I collect data
>>>>> from. However, I would like to group the data from every sensor at
>>>>> predefined time intervals and process it together.
>>>>>
>>>>> Using Storm terminology, I would have each sensor send data to a
>>>>> spout. The spouts would then send tuples to a specific bolt that will
>>>>> process all of the data within a specific time partition. Each spout will
>>>>> tag each event with a time id and each bolt will process data after
>>>>> collecting all of the data with the same time id tags.
>>>>>
>>>>> Is this possible with Storm?
>>>>>
>>>>> I appreciate your help!
>>>>>
>>>>> Jonathan
>>>>>
>>>>
>>>
>>>
>>
>>
>
Re: Time Partitioning of Tuples
Posted by Jonathan Poon <jk...@ucdavis.edu>.
I will take a look into Trident as well. Thanks for the tip!
On Fri, Jun 6, 2014 at 3:53 PM, Kyle Nusbaum <kn...@yahoo-inc.com> wrote:
> Sounds interesting.
>
> I don't know much about your project, so I won't speculate about your
> purposes.
>
> One thing to consider is that the duration of the computation on a time
> slice must be longer than the time slice itself to really make this type of
> setup worthwhile. Otherwise you could just feed the batches through the
> same bolt, since it would be done processing a batch before the next one
> comes in.
>
> -- Kyle
>
> On 06/06/2014 05:40 PM, Jonathan Poon wrote:
>
> Hi Kyle,
>
> I'm looking for a real-time batch processing tool. In my case, I'm
> looking to make correlations between all of the sensors at each time
> interval.
>
> I could use Hadoop (Map Reduce), but it requires I need to collect all of
> the data before I can batch process each time partition of data from each
> sensor.
>
> Another tool I'm also looking at is Spark Streaming, which allows me to
> collect data at different time intervals and processing that batch of data
> using Map Reduce
>
> However, Map Reduce seems inefficient because my sensor data is already
> time sorted naturally. In addition, I would like real-time data on the fly.
>
> Seems like Storm might be a candidate for this application. Please let
> me know what you think...! Thanks for your help!
>
> Jonathan
>
>
>
>
> On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <kn...@yahoo-inc.com>
> wrote:
>
>> You could send a signal tuple from the spout when it knows it's sent
>> the last tuple for a time period, or include a field in the tuple for
>> indicating it's the last member.
>>
>> I'm curious about why you want to do this, since the purpose of storm is
>> to facilitate stream processing rather than the type of batch processing
>> you're describing.
>>
>> -- Kyle
>>
>> On 06/06/2014 05:14 PM, Jonathan Poon wrote:
>>
>> Hi Nathan,
>>
>> The sensor data I have is naturally time sorted, since its just
>> collecting data and emitting it to a spout. Is it possible for a bolt to
>> know when all of the tuples with the same time tag have been collected and
>> to start processing it together? Or is it only possible for a bolt to
>> process each tuple one at a time?
>>
>> Thanks!
>>
>>
>>
>> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> You can have your bolt subscribe to the spout using fields grouping and
>>> use time tag as your key.
>>> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I'm currently investigating different data processing tools for an
>>>> application I'm interested in. I have many sensors that I collect data
>>>> from. However, I would like to group the data from every sensor at
>>>> predefined time intervals and process it together.
>>>>
>>>> Using Storm terminology, I would have each sensor send data to a
>>>> spout. The spouts would then send tuples to a specific bolt that will
>>>> process all of the data within a specific time partition. Each spout will
>>>> tag each event with a time id and each bolt will process data after
>>>> collecting all of the data with the same time id tags.
>>>>
>>>> Is this possible with Storm?
>>>>
>>>> I appreciate your help!
>>>>
>>>> Jonathan
>>>>
>>>
>>
>>
>
>
Re: Time Partitioning of Tuples
Posted by Kyle Nusbaum <kn...@yahoo-inc.com>.
Sounds interesting.
I don't know much about your project, so I won't speculate about your
purposes.
One thing to consider is that the duration of the computation on a time
slice must be longer than the time slice itself to really make this type
of setup worthwhile. Otherwise you could just feed the batches through
the same bolt, since it would be done processing a batch before the next
one comes in.
-- Kyle
On 06/06/2014 05:40 PM, Jonathan Poon wrote:
> Hi Kyle,
>
> I'm looking for a real-time batch processing tool. In my case, I'm
> looking to make correlations between all of the sensors at each time
> interval.
>
> I could use Hadoop (Map Reduce), but it requires I need to collect all
> of the data before I can batch process each time partition of data
> from each sensor.
>
> Another tool I'm also looking at is Spark Streaming, which allows me
> to collect data at different time intervals and processing that batch
> of data using Map Reduce
>
> However, Map Reduce seems inefficient because my sensor data is
> already time sorted naturally. In addition, I would like real-time
> data on the fly.
>
> Seems like Storm might be a candidate for this application. Please let
> me know what you think...! Thanks for your help!
>
> Jonathan
>
>
>
>
> On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <knusbaum@yahoo-inc.com
> <ma...@yahoo-inc.com>> wrote:
>
> You could send a signal tuple from the spout when it knows it's
> sent the last tuple for a time period, or include a field in the
> tuple for indicating it's the last member.
>
> I'm curious about why you want to do this, since the purpose of
> storm is to facilitate stream processing rather than the type of
> batch processing you're describing.
>
> -- Kyle
>
> On 06/06/2014 05:14 PM, Jonathan Poon wrote:
>> Hi Nathan,
>>
>> The sensor data I have is naturally time sorted, since its just
>> collecting data and emitting it to a spout. Is it possible for a
>> bolt to know when all of the tuples with the same time tag have
>> been collected and to start processing it together? Or is it
>> only possible for a bolt to process each tuple one at a time?
>>
>> Thanks!
>>
>>
>>
>> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncleung@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>> You can have your bolt subscribe to the spout using fields
>> grouping and use time tag as your key.
>>
>> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkpoon@ucdavis.edu
>> <ma...@ucdavis.edu>> wrote:
>>
>> Hi Everyone,
>>
>> I'm currently investigating different data processing
>> tools for an application I'm interested in. I have many
>> sensors that I collect data from. However, I would like
>> to group the data from every sensor at predefined time
>> intervals and process it together.
>>
>> Using Storm terminology, I would have each sensor send
>> data to a spout. The spouts would then send tuples to a
>> specific bolt that will process all of the data within a
>> specific time partition. Each spout will tag each event
>> with a time id and each bolt will process data after
>> collecting all of the data with the same time id tags.
>>
>> Is this possible with Storm?
>>
>> I appreciate your help!
>>
>> Jonathan
>>
>>
>
>
Re: Time Partitioning of Tuples
Posted by Shahab Yunus <sh...@gmail.com>.
I might be off-base here (experts can chime in) but have you looked at
Trident which is an API based on Storm? It does provide the concept of
aggregation to simulate micro-batches or in other words maintain state. You
can even persists this partial aggregated state too. If Spark's in-memory
M/R is not fast enough for you then you can take a look at that, perhaps.
Regards,
Shahab
On Fri, Jun 6, 2014 at 6:40 PM, Jonathan Poon <jk...@ucdavis.edu> wrote:
> Hi Kyle,
>
> I'm looking for a real-time batch processing tool. In my case, I'm
> looking to make correlations between all of the sensors at each time
> interval.
>
> I could use Hadoop (Map Reduce), but it requires I need to collect all of
> the data before I can batch process each time partition of data from each
> sensor.
>
> Another tool I'm also looking at is Spark Streaming, which allows me to
> collect data at different time intervals and processing that batch of data
> using Map Reduce
>
> However, Map Reduce seems inefficient because my sensor data is already
> time sorted naturally. In addition, I would like real-time data on the fly.
>
> Seems like Storm might be a candidate for this application. Please let me
> know what you think...! Thanks for your help!
>
> Jonathan
>
>
>
>
> On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <kn...@yahoo-inc.com>
> wrote:
>
>> You could send a signal tuple from the spout when it knows it's sent
>> the last tuple for a time period, or include a field in the tuple for
>> indicating it's the last member.
>>
>> I'm curious about why you want to do this, since the purpose of storm is
>> to facilitate stream processing rather than the type of batch processing
>> you're describing.
>>
>> -- Kyle
>>
>> On 06/06/2014 05:14 PM, Jonathan Poon wrote:
>>
>> Hi Nathan,
>>
>> The sensor data I have is naturally time sorted, since its just
>> collecting data and emitting it to a spout. Is it possible for a bolt to
>> know when all of the tuples with the same time tag have been collected and
>> to start processing it together? Or is it only possible for a bolt to
>> process each tuple one at a time?
>>
>> Thanks!
>>
>>
>>
>> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <nc...@gmail.com> wrote:
>>
>>> You can have your bolt subscribe to the spout using fields grouping and
>>> use time tag as your key.
>>> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu> wrote:
>>>
>>>> Hi Everyone,
>>>>
>>>> I'm currently investigating different data processing tools for an
>>>> application I'm interested in. I have many sensors that I collect data
>>>> from. However, I would like to group the data from every sensor at
>>>> predefined time intervals and process it together.
>>>>
>>>> Using Storm terminology, I would have each sensor send data to a
>>>> spout. The spouts would then send tuples to a specific bolt that will
>>>> process all of the data within a specific time partition. Each spout will
>>>> tag each event with a time id and each bolt will process data after
>>>> collecting all of the data with the same time id tags.
>>>>
>>>> Is this possible with Storm?
>>>>
>>>> I appreciate your help!
>>>>
>>>> Jonathan
>>>>
>>>
>>
>>
>
RE: Time Partitioning of Tuples
Posted by Lesego Peter <l....@newcastle.ac.uk>.
Thanks, that has really helped.
-Lesego
________________________________
From: Dan [dcieslak@hotmail.com]
Sent: 08 June 2014 23:30
To: user@storm.incubator.apache.org
Subject: RE: Time Partitioning of Tuples
I've not used it, but you might look at:
https://github.com/buildlackey/cep/tree/master/esper%2Bstorm%2Bkafka
-Dan
________________________________
From: l.p.peter@newcastle.ac.uk
To: user@storm.incubator.apache.org
Subject: RE: Time Partitioning of Tuples
Date: Sun, 8 Jun 2014 21:40:37 +0000
Hi,
Is there anybody who has already embedded Esper into Storm?
-Lesego
________________________________
From: Dan [dcieslak@hotmail.com]
Sent: 07 June 2014 02:04
To: user@storm.incubator.apache.org
Subject: RE: Time Partitioning of Tuples
You might look at Esper. I believe someone has even embedded Esper into Storm
-Dan
________________________________
Date: Fri, 6 Jun 2014 15:40:08 -0700
Subject: Re: Time Partitioning of Tuples
From: jkpoon@ucdavis.edu
To: user@storm.incubator.apache.org
Hi Kyle,
I'm looking for a real-time batch processing tool. In my case, I'm looking to make correlations between all of the sensors at each time interval.
I could use Hadoop (Map Reduce), but it requires I need to collect all of the data before I can batch process each time partition of data from each sensor.
Another tool I'm also looking at is Spark Streaming, which allows me to collect data at different time intervals and processing that batch of data using Map Reduce
However, Map Reduce seems inefficient because my sensor data is already time sorted naturally. In addition, I would like real-time data on the fly.
Seems like Storm might be a candidate for this application. Please let me know what you think...! Thanks for your help!
Jonathan
On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <kn...@yahoo-inc.com>> wrote:
You could send a signal tuple from the spout when it knows it's sent the last tuple for a time period, or include a field in the tuple for indicating it's the last member.
I'm curious about why you want to do this, since the purpose of storm is to facilitate stream processing rather than the type of batch processing you're describing.
-- Kyle
On 06/06/2014 05:14 PM, Jonathan Poon wrote:
Hi Nathan,
The sensor data I have is naturally time sorted, since its just collecting data and emitting it to a spout. Is it possible for a bolt to know when all of the tuples with the same time tag have been collected and to start processing it together? Or is it only possible for a bolt to process each tuple one at a time?
Thanks!
On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <nc...@gmail.com>> wrote:
You can have your bolt subscribe to the spout using fields grouping and use time tag as your key.
On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu>> wrote:
Hi Everyone,
I'm currently investigating different data processing tools for an application I'm interested in. I have many sensors that I collect data from. However, I would like to group the data from every sensor at predefined time intervals and process it together.
Using Storm terminology, I would have each sensor send data to a spout. The spouts would then send tuples to a specific bolt that will process all of the data within a specific time partition. Each spout will tag each event with a time id and each bolt will process data after collecting all of the data with the same time id tags.
Is this possible with Storm?
I appreciate your help!
Jonathan
RE: Time Partitioning of Tuples
Posted by Dan <dc...@hotmail.com>.
I've not used it, but you might look at:
https://github.com/buildlackey/cep/tree/master/esper%2Bstorm%2Bkafka
-Dan
From: l.p.peter@newcastle.ac.uk
To: user@storm.incubator.apache.org
Subject: RE: Time Partitioning of Tuples
Date: Sun, 8 Jun 2014 21:40:37 +0000
Hi,
Is there anybody who has already embedded Esper into Storm?
-Lesego
From: Dan [dcieslak@hotmail.com]
Sent: 07 June 2014 02:04
To: user@storm.incubator.apache.org
Subject: RE: Time Partitioning of Tuples
You might look at Esper. I believe someone has even embedded Esper into Storm
-Dan
Date: Fri, 6 Jun 2014 15:40:08 -0700
Subject: Re: Time Partitioning of Tuples
From: jkpoon@ucdavis.edu
To: user@storm.incubator.apache.org
Hi Kyle,
I'm looking for a real-time batch processing tool. In my case, I'm looking to make correlations between all of the sensors at each time interval.
I could use Hadoop (Map Reduce), but it requires I need to collect all of the data before I can batch process each time partition of data from each sensor.
Another tool I'm also looking at is Spark Streaming, which allows me to collect data at different time intervals and processing that batch of data using Map Reduce
However, Map Reduce seems inefficient because my sensor data is already time sorted naturally. In addition, I would like real-time data on the fly.
Seems like Storm might be a candidate for this application. Please let me know what you think...! Thanks for your help!
Jonathan
On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum
<kn...@yahoo-inc.com> wrote:
You could send a signal tuple from the spout when it knows it's sent the last tuple for a time period, or include a field in the tuple for indicating it's the last member.
I'm curious about why you want to do this, since the purpose of storm is to facilitate stream processing rather than the type of batch processing you're describing.
-- Kyle
On 06/06/2014 05:14 PM, Jonathan Poon wrote:
Hi Nathan,
The sensor data I have is naturally time sorted, since its just collecting data and emitting it to a spout. Is it possible for a bolt to know when all of the tuples with the same time tag have been collected and to start processing it together? Or is it only
possible for a bolt to process each tuple one at a time?
Thanks!
On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung
<nc...@gmail.com> wrote:
You can have your bolt subscribe to the spout using fields grouping and use time tag as your key.
On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu> wrote:
Hi Everyone,
I'm currently investigating different data processing tools for an application I'm interested in. I have many sensors that I collect data from. However, I would like to group the data from every sensor at predefined time intervals and process it together.
Using Storm terminology, I would have each sensor send data to a spout. The spouts would then send tuples to a specific bolt that will process all of the data within a specific time partition. Each spout will tag each event with a time id and each bolt will
process data after collecting all of the data with the same time id tags.
Is this possible with Storm?
I appreciate your help!
Jonathan
RE: Time Partitioning of Tuples
Posted by Lesego Peter <l....@newcastle.ac.uk>.
Hi,
Is there anybody who has already embedded Esper into Storm?
-Lesego
________________________________
From: Dan [dcieslak@hotmail.com]
Sent: 07 June 2014 02:04
To: user@storm.incubator.apache.org
Subject: RE: Time Partitioning of Tuples
You might look at Esper. I believe someone has even embedded Esper into Storm
-Dan
________________________________
Date: Fri, 6 Jun 2014 15:40:08 -0700
Subject: Re: Time Partitioning of Tuples
From: jkpoon@ucdavis.edu
To: user@storm.incubator.apache.org
Hi Kyle,
I'm looking for a real-time batch processing tool. In my case, I'm looking to make correlations between all of the sensors at each time interval.
I could use Hadoop (Map Reduce), but it requires I need to collect all of the data before I can batch process each time partition of data from each sensor.
Another tool I'm also looking at is Spark Streaming, which allows me to collect data at different time intervals and processing that batch of data using Map Reduce
However, Map Reduce seems inefficient because my sensor data is already time sorted naturally. In addition, I would like real-time data on the fly.
Seems like Storm might be a candidate for this application. Please let me know what you think...! Thanks for your help!
Jonathan
On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <kn...@yahoo-inc.com>> wrote:
You could send a signal tuple from the spout when it knows it's sent the last tuple for a time period, or include a field in the tuple for indicating it's the last member.
I'm curious about why you want to do this, since the purpose of storm is to facilitate stream processing rather than the type of batch processing you're describing.
-- Kyle
On 06/06/2014 05:14 PM, Jonathan Poon wrote:
Hi Nathan,
The sensor data I have is naturally time sorted, since its just collecting data and emitting it to a spout. Is it possible for a bolt to know when all of the tuples with the same time tag have been collected and to start processing it together? Or is it only possible for a bolt to process each tuple one at a time?
Thanks!
On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <nc...@gmail.com>> wrote:
You can have your bolt subscribe to the spout using fields grouping and use time tag as your key.
On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu>> wrote:
Hi Everyone,
I'm currently investigating different data processing tools for an application I'm interested in. I have many sensors that I collect data from. However, I would like to group the data from every sensor at predefined time intervals and process it together.
Using Storm terminology, I would have each sensor send data to a spout. The spouts would then send tuples to a specific bolt that will process all of the data within a specific time partition. Each spout will tag each event with a time id and each bolt will process data after collecting all of the data with the same time id tags.
Is this possible with Storm?
I appreciate your help!
Jonathan
RE: Time Partitioning of Tuples
Posted by Dan <dc...@hotmail.com>.
You might look at Esper. I believe someone has even embedded Esper into Storm
-Dan
Date: Fri, 6 Jun 2014 15:40:08 -0700
Subject: Re: Time Partitioning of Tuples
From: jkpoon@ucdavis.edu
To: user@storm.incubator.apache.org
Hi Kyle,
I'm looking for a real-time batch processing tool. In my case, I'm looking to make correlations between all of the sensors at each time interval.
I could use Hadoop (Map Reduce), but it requires I need to collect all of the data before I can batch process each time partition of data from each sensor.
Another tool I'm also looking at is Spark Streaming, which allows me to collect data at different time intervals and processing that batch of data using Map Reduce
However, Map Reduce seems inefficient because my sensor data is already time sorted naturally. In addition, I would like real-time data on the fly.
Seems like Storm might be a candidate for this application. Please let me know what you think...! Thanks for your help!
Jonathan
On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <kn...@yahoo-inc.com> wrote:
You could send a signal tuple from the
spout when it knows it's sent the last tuple for a time period, or
include a field in the tuple for indicating it's the last member.
I'm curious about why you want to do this, since the purpose of
storm is to facilitate stream processing rather than the type of
batch processing you're describing.
-- Kyle
On 06/06/2014 05:14 PM, Jonathan Poon wrote:
Hi Nathan,
The sensor data I have is naturally time sorted, since its
just collecting data and emitting it to a spout. Is it
possible for a bolt to know when all of the tuples with the
same time tag have been collected and to start processing it
together? Or is it only possible for a bolt to process each
tuple one at a time?
Thanks!
On Fri, Jun 6, 2014 at 3:07 PM, Nathan
Leung <nc...@gmail.com>
wrote:
You can have your bolt subscribe to the spout
using fields grouping and use time tag as your key.
On Jun 6, 2014 6:01 PM,
"Jonathan Poon" <jk...@ucdavis.edu>
wrote:
Hi Everyone,
I'm currently investigating different data
processing tools for an application I'm
interested in. I have many sensors that I
collect data from. However, I would like
to group the data from every sensor at
predefined time intervals and process it
together.
Using Storm terminology, I would have each
sensor send data to a spout. The spouts
would then send tuples to a specific bolt
that will process all of the data within a
specific time partition. Each spout will
tag each event with a time id and each bolt
will process data after collecting all of
the data with the same time id tags.
Is this possible with Storm?
I appreciate your help!
Jonathan
Re: Time Partitioning of Tuples
Posted by Jonathan Poon <jk...@ucdavis.edu>.
Hi Kyle,
I'm looking for a real-time batch processing tool. In my case, I'm looking
to make correlations between all of the sensors at each time interval.
I could use Hadoop (Map Reduce), but it requires I need to collect all of
the data before I can batch process each time partition of data from each
sensor.
Another tool I'm also looking at is Spark Streaming, which allows me to
collect data at different time intervals and processing that batch of data
using Map Reduce
However, Map Reduce seems inefficient because my sensor data is already
time sorted naturally. In addition, I would like real-time data on the fly.
Seems like Storm might be a candidate for this application. Please let me
know what you think...! Thanks for your help!
Jonathan
On Fri, Jun 6, 2014 at 3:32 PM, Kyle Nusbaum <kn...@yahoo-inc.com> wrote:
> You could send a signal tuple from the spout when it knows it's sent the
> last tuple for a time period, or include a field in the tuple for
> indicating it's the last member.
>
> I'm curious about why you want to do this, since the purpose of storm is
> to facilitate stream processing rather than the type of batch processing
> you're describing.
>
> -- Kyle
>
> On 06/06/2014 05:14 PM, Jonathan Poon wrote:
>
> Hi Nathan,
>
> The sensor data I have is naturally time sorted, since its just
> collecting data and emitting it to a spout. Is it possible for a bolt to
> know when all of the tuples with the same time tag have been collected and
> to start processing it together? Or is it only possible for a bolt to
> process each tuple one at a time?
>
> Thanks!
>
>
>
> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <nc...@gmail.com> wrote:
>
>> You can have your bolt subscribe to the spout using fields grouping and
>> use time tag as your key.
>> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu> wrote:
>>
>>> Hi Everyone,
>>>
>>> I'm currently investigating different data processing tools for an
>>> application I'm interested in. I have many sensors that I collect data
>>> from. However, I would like to group the data from every sensor at
>>> predefined time intervals and process it together.
>>>
>>> Using Storm terminology, I would have each sensor send data to a
>>> spout. The spouts would then send tuples to a specific bolt that will
>>> process all of the data within a specific time partition. Each spout will
>>> tag each event with a time id and each bolt will process data after
>>> collecting all of the data with the same time id tags.
>>>
>>> Is this possible with Storm?
>>>
>>> I appreciate your help!
>>>
>>> Jonathan
>>>
>>
>
>
Re: Time Partitioning of Tuples
Posted by Kyle Nusbaum <kn...@yahoo-inc.com>.
You could send a signal tuple from the spout when it knows it's sent the
last tuple for a time period, or include a field in the tuple for
indicating it's the last member.
I'm curious about why you want to do this, since the purpose of storm is
to facilitate stream processing rather than the type of batch processing
you're describing.
-- Kyle
On 06/06/2014 05:14 PM, Jonathan Poon wrote:
> Hi Nathan,
>
> The sensor data I have is naturally time sorted, since its just
> collecting data and emitting it to a spout. Is it possible for a bolt
> to know when all of the tuples with the same time tag have been
> collected and to start processing it together? Or is it only possible
> for a bolt to process each tuple one at a time?
>
> Thanks!
>
>
>
> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncleung@gmail.com
> <ma...@gmail.com>> wrote:
>
> You can have your bolt subscribe to the spout using fields
> grouping and use time tag as your key.
>
> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkpoon@ucdavis.edu
> <ma...@ucdavis.edu>> wrote:
>
> Hi Everyone,
>
> I'm currently investigating different data processing tools
> for an application I'm interested in. I have many sensors
> that I collect data from. However, I would like to group the
> data from every sensor at predefined time intervals and
> process it together.
>
> Using Storm terminology, I would have each sensor send data to
> a spout. The spouts would then send tuples to a specific bolt
> that will process all of the data within a specific time
> partition. Each spout will tag each event with a time id and
> each bolt will process data after collecting all of the data
> with the same time id tags.
>
> Is this possible with Storm?
>
> I appreciate your help!
>
> Jonathan
>
>
Re: Time Partitioning of Tuples
Posted by Kyle Nusbaum <kn...@yahoo-inc.com>.
You could send a signal tuple from the spout when it knows it's sent the
last tuple for a time period, or include a field in the tuple for
indicating it's the last member.
I'm curious about why you want to do this, since the purpose of storm is
to facilitate stream processing rather than the type of batch processing
you're describing.
-- Kyle
On 06/06/2014 05:14 PM, Jonathan Poon wrote:
> Hi Nathan,
>
> The sensor data I have is naturally time sorted, since its just
> collecting data and emitting it to a spout. Is it possible for a bolt
> to know when all of the tuples with the same time tag have been
> collected and to start processing it together? Or is it only possible
> for a bolt to process each tuple one at a time?
>
> Thanks!
>
>
>
> On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <ncleung@gmail.com
> <ma...@gmail.com>> wrote:
>
> You can have your bolt subscribe to the spout using fields
> grouping and use time tag as your key.
>
> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jkpoon@ucdavis.edu
> <ma...@ucdavis.edu>> wrote:
>
> Hi Everyone,
>
> I'm currently investigating different data processing tools
> for an application I'm interested in. I have many sensors
> that I collect data from. However, I would like to group the
> data from every sensor at predefined time intervals and
> process it together.
>
> Using Storm terminology, I would have each sensor send data to
> a spout. The spouts would then send tuples to a specific bolt
> that will process all of the data within a specific time
> partition. Each spout will tag each event with a time id and
> each bolt will process data after collecting all of the data
> with the same time id tags.
>
> Is this possible with Storm?
>
> I appreciate your help!
>
> Jonathan
>
>
Re: Time Partitioning of Tuples
Posted by Jonathan Poon <jk...@ucdavis.edu>.
Hi Nathan,
The sensor data I have is naturally time sorted, since its just collecting
data and emitting it to a spout. Is it possible for a bolt to know when all
of the tuples with the same time tag have been collected and to start
processing it together? Or is it only possible for a bolt to process each
tuple one at a time?
Thanks!
On Fri, Jun 6, 2014 at 3:07 PM, Nathan Leung <nc...@gmail.com> wrote:
> You can have your bolt subscribe to the spout using fields grouping and
> use time tag as your key.
> On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu> wrote:
>
>> Hi Everyone,
>>
>> I'm currently investigating different data processing tools for an
>> application I'm interested in. I have many sensors that I collect data
>> from. However, I would like to group the data from every sensor at
>> predefined time intervals and process it together.
>>
>> Using Storm terminology, I would have each sensor send data to a spout.
>> The spouts would then send tuples to a specific bolt that will process all
>> of the data within a specific time partition. Each spout will tag each
>> event with a time id and each bolt will process data after collecting all
>> of the data with the same time id tags.
>>
>> Is this possible with Storm?
>>
>> I appreciate your help!
>>
>> Jonathan
>>
>
Re: Time Partitioning of Tuples
Posted by Nathan Leung <nc...@gmail.com>.
You can have your bolt subscribe to the spout using fields grouping and use
time tag as your key.
On Jun 6, 2014 6:01 PM, "Jonathan Poon" <jk...@ucdavis.edu> wrote:
> Hi Everyone,
>
> I'm currently investigating different data processing tools for an
> application I'm interested in. I have many sensors that I collect data
> from. However, I would like to group the data from every sensor at
> predefined time intervals and process it together.
>
> Using Storm terminology, I would have each sensor send data to a spout.
> The spouts would then send tuples to a specific bolt that will process all
> of the data within a specific time partition. Each spout will tag each
> event with a time id and each bolt will process data after collecting all
> of the data with the same time id tags.
>
> Is this possible with Storm?
>
> I appreciate your help!
>
> Jonathan
>