You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@storm.apache.org by Alec Lee <al...@gmail.com> on 2015/08/26 00:37:12 UTC

sorting events by timestamp

Hi, all

is there any sample codes to sort the events in terms of the timestamps field of a tuple?

thanks


AL

Re: sorting events by timestamp

Posted by Alec Lee <al...@gmail.com>.
Thanks Andrew

Here is the events I got from Kafka, I only print out the field timestamp:

2013-03-22 12:43:00-07:00
2013-03-22 12:44:00-07:00
2013-03-22 12:45:00-07:00
2013-03-22 12:49:00-07:00
2013-03-22 12:47:00-07:00
2013-03-22 12:48:00-07:00
2013-03-22 12:46:00-07:00
2013-03-22 12:51:00-07:00
2013-03-22 12:50:00-07:00
2013-03-22 12:52:00-07:00
2013-03-22 12:55:00-07:00
2013-03-22 12:54:00-07:00
2013-03-22 12:53:00-07:00
2013-03-22 12:58:00-07:00
2013-03-22 12:57:00-07:00
2013-03-22 12:56:00-07:00

Basically the events are being recorded every minute before it was fed into kafka, it was shuffled in kafka, so the timestamp is out of order while reading in storm kafkaSpout, I want to be able to sort the tuples based on timestamp then emit in order. That will be great if there is example code I can read.

thanks

AL




> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com> wrote:
> 
> What do you mean by that? It's a bit vague as timestamps can have quite high resolution (like for example minutes, seconds, msec) so you will probably have to do a bit of bucketization before sorting them.... then by using a partition aggregator (in Trident at least) you can to this very easily.
> ​​
> Hope this helps.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
> Hi, all
> 
> is there any sample codes to sort the events in terms of the timestamps field of a tuple?
> 
> thanks
> 
> 
> AL
> 


Re: sorting events by timestamp

Posted by Alec Lee <al...@gmail.com>.
Hi, Andrew

I have made a trident topology works today, the topology is like this 

                        topology.newStream("spoutInit", kafkaSpout)
                                           .each(new Fields("str"),
                                                 new JsonObjectParse(),
                                                 new Fields("sensor_id",
                                                            "period",
                                                            "powermon_num",
                                                            "current",
                                                            "id",
                                                            "measurement_timestamp",
                                                            "measurement_dateuploaded")
                                                 )
                                           .parallelismHint(6)
                                           .groupBy(new Fields("sensor_id"))
                                           .each(new Fields("sensor_id","measurement_timestamp"),
                                                 new PrintStream(),
                                                 new Fields("sensorid","timestamps")) ;



I am able to print the device_id and timestamps
[febc0061, 2013-03-14 19:23:00-07:00]
[febc0061, 2013-03-14 19:22:00-07:00]
[febc0061, 2013-03-14 19:21:00-07:00]
[febc0061, 2013-03-14 18:42:00-07:00]
[febc0061, 2013-03-14 18:59:00-07:00]
[febc0061, 2013-03-14 19:20:00-07:00]
[febc0061, 2013-03-14 18:39:00-07:00]
[febc0061, 2013-03-14 18:41:00-07:00]
[febc0061, 2013-03-14 18:40:00-07:00]
[febc0061, 2013-03-14 18:58:00-07:00]
[febc0061, 2013-03-14 18:49:00-07:00]
[febc0061, 2013-03-14 18:44:00-07:00]
[febc0061, 2013-03-14 18:43:00-07:00]
[febc0061, 2013-03-14 18:48:00-07:00]
[febc0061, 2013-03-14 18:46:00-07:00]
[febc0061, 2013-03-14 18:45:00-07:00]
[febc0061, 2013-03-14 18:47:00-07:00]
[febc0061, 2013-03-14 18:52:00-07:00]
[febc0061, 2013-03-14 18:50:00-07:00]
[febc0061, 2013-03-14 18:57:00-07:00]
[febc0061, 2013-03-14 18:51:00-07:00]
[febc0061, 2013-03-14 18:56:00-07:00]
[febc0061, 2013-03-14 18:53:00-07:00]
[febc0061, 2013-03-14 18:55:00-07:00]
[febc0061, 2013-03-14 18:54:00-07:00]
[febc0061, 2013-03-14 19:19:00-07:00]
[febc0061, 2013-03-14 19:03:00-07:00]
[febc0061, 2013-03-14 19:12:00-07:00]

I am poking around, was’t able to get any hint to use partition aggregator to do sorting in batch, i thought this sorting will be done locally in local partition after group by, so each partition will take multiple group which has sorted records, how to do that in batch? In addition, I need to do a reduce process in the end to combine all the partitions?

thanks


> On Aug 26, 2015, at 11:11 AM, Andrew Xor <an...@gmail.com> wrote:
> 
> I don't think it's easier or harder to learn... but both have pros and cons; in your case the semantics that you are trying to apply in your particular scenario sound more like a use-case for a Trident based topology that's all.
> 
> Regards.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 8:35 PM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
> For dealing with such type of problem, seems trident is better than spout+bolts even latter is easier to understand and learn?
> 
> AL
>> On Aug 25, 2015, at 9:31 PM, Kishore Senji <ksenji@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Agreed. This makes sense if the aggregation is on fields etc. 
>> 
>> Although Alec did not mention it in this post, based on his previous posts on the same topic, I would assume he is trying to  sort the events because he wanted to "fill in" the missing events (smoothening the curve so to speak) by looking at the previous and next events of the missed timestamp and then do some stream processing on top of it (like for example alerting based on sliding window). Assuming that is the scenario, I guess then he would have to keep more metadata in the State so that he can fill in those events but the question would be when would he stop looking for missing events and fill them and move on (as they can come in different batches), plus he would have to do some stream processing (or store them to ES for later search for example) in the State itself if there is any such processing. This is where I think it gets tricky to do this in the partition aggregator.
>> 
>> So in our earlier posts we suggested he can do the the appropriate partitioning in Kafka (so that events from a given device ends up in the same partition) and he could do the window based sorting (by buffering few events) in the Stream processing.
>> 
>> Alec, Please ignore the above if my assumption is not correct.
>> 
>> 
>> On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>> This is not an issue, as that probably would be done through a partition aggregator after the groupBy.
>> 
>> Kindly yours,
>> 
>> Andrew Grammenos
>> 
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>> 
>> On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <ksenji@gmail.com <ma...@gmail.com>> wrote:
>> Interesting. But wouldn't this be impacted by the trident batch size?
>> 
>> Assuming the batch boundary is like below, after bucketing you would groupBy on the start time (but how would you sort it?) and assumed it can be sorted, we should be done with that batch. so if the batch boundary is like below, you would end up with two different sets of sorts for events which are supposed to be together (12:44, 12:45 & 12:46 below). If I understand the original question, it is how to sort the full stream of events irrespective of how they are processed in batches.
>> 
>> 2013-03-22 12:43:00-07:00
>> 2013-03-22 12:44:00-07:00
>> 2013-03-22 12:45:00-07:00
>> 2013-03-22 12:49:00-07:00
>> 2013-03-22 12:47:00-07:00
>> --------------------------------------
>> 2013-03-22 12:48:00-07:00
>> 2013-03-22 12:46:00-07:00
>> 2013-03-22 12:51:00-07:00
>> 2013-03-22 12:50:00-07:00
>> 2013-03-22 12:52:00-07:00
>>  
>> 
>> 
>> 
>> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>> Yes, unless I am missing something... try it and if you have any more problems drop an email.
>> 
>> Regards.
>> 
>> Kindly yours,
>> 
>> Andrew Grammenos
>> 
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>> 
>> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>> WoW, that code seems to be exactly I want, will read through, double check, I will still need a partition aggregator to actually sorting after bucketization, right?
>> 
>> thanks
>> 
>> 
>>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Sure, I found this code useful to start with; he does bucketization for timed intervals in this gist <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>>> 
>>> Hope this helps.
>>> 
>>> Kindly yours,
>>> 
>>> Andrew Grammenos
>>> 
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>> 
>>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>> All right, will do trident instead, shameless to ask again, any example code (particularly for events time sorting) to study? 
>>> 
>>> thanks
>>> 
>>> 
>>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> Well, if you need to just preserve the order of received (event) tuples then why not use trident instead? Trident ensures correct ordering (chronologically) as well as exactly once processing without any gimmicks; sorting it secondary to the event generation sounds like you will enter into quite a bit of hassle for no reason.
>>>> 
>>>> Regards.
>>>> 
>>>> Kindly yours,
>>>> 
>>>> Andrew Grammenos
>>>> 
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>> 
>>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>>> 
>>>> 
>>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>>>> 
>>>>> What do you mean by that? It's a bit vague as timestamps can have quite high resolution (like for example minutes, seconds, msec) so you will probably have to do a bit of bucketization before sorting them.... then by using a partition aggregator (in Trident at least) you can to this very easily.
>>>>> ​​
>>>>> Hope this helps.
>>>>> 
>>>>> Kindly yours,
>>>>> 
>>>>> Andrew Grammenos
>>>>> 
>>>>> -- PGP PKey --
>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>> 
>>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>>>> Hi, all
>>>>> 
>>>>> is there any sample codes to sort the events in terms of the timestamps field of a tuple?
>>>>> 
>>>>> thanks
>>>>> 
>>>>> 
>>>>> AL
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> 
>> 
> 
> 


Re: sorting events by timestamp

Posted by Andrew Xor <an...@gmail.com>.
I don't think it's easier or harder to learn... but both have pros and
cons; in your case the semantics that you are trying to apply in your
particular scenario sound more like a use-case for a Trident based topology
that's all.

Regards.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
<https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>

On Wed, Aug 26, 2015 at 8:35 PM, Alec Lee <al...@gmail.com> wrote:

> For dealing with such type of problem, seems trident is better than
> spout+bolts even latter is easier to understand and learn?
>
> AL
>
> On Aug 25, 2015, at 9:31 PM, Kishore Senji <ks...@gmail.com> wrote:
>
> Agreed. This makes sense if the aggregation is on fields etc.
>
> Although Alec did not mention it in this post, based on his previous posts
> on the same topic, I would assume he is trying to  sort the events because
> he wanted to "fill in" the missing events (smoothening the curve so to
> speak) by looking at the previous and next events of the missed timestamp
> and then do some stream processing on top of it (like for example alerting
> based on sliding window). Assuming that is the scenario, I guess then he
> would have to keep more metadata in the State so that he can fill in those
> events but the question would be when would he stop looking for missing
> events and fill them and move on (as they can come in different batches),
> plus he would have to do some stream processing (or store them to ES for
> later search for example) in the State itself if there is any such
> processing. This is where I think it gets tricky to do this in the
> partition aggregator.
>
> So in our earlier posts we suggested he can do the the appropriate
> partitioning in Kafka (so that events from a given device ends up in the
> same partition) and he could do the window based sorting (by buffering few
> events) in the Stream processing.
>
> Alec, Please ignore the above if my assumption is not correct.
>
>
> On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <an...@gmail.com>
> wrote:
>
>> This is not an issue, as that probably would be done through a partition
>> aggregator after the groupBy.
>>
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>
>> On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <ks...@gmail.com> wrote:
>>
>>> Interesting. But wouldn't this be impacted by the trident batch size?
>>>
>>> Assuming the batch boundary is like below, after bucketing you would
>>> groupBy on the start time (but how would you sort it?) and assumed it can
>>> be sorted, we should be done with that batch. so if the batch boundary is
>>> like below, you would end up with two different sets of sorts for events
>>> which are supposed to be together (12:44, 12:45 & 12:46 below). If I
>>> understand the original question, it is how to sort the full stream of
>>> events irrespective of how they are processed in batches.
>>>
>>> 2013-03-22 12:43:00-07:00
>>> 2013-03-22 12:44:00-07:00
>>> 2013-03-22 12:45:00-07:00
>>> 2013-03-22 12:49:00-07:00
>>> 2013-03-22 12:47:00-07:00
>>> --------------------------------------
>>> 2013-03-22 12:48:00-07:00
>>> 2013-03-22 12:46:00-07:00
>>> 2013-03-22 12:51:00-07:00
>>> 2013-03-22 12:50:00-07:00
>>> 2013-03-22 12:52:00-07:00
>>>
>>>
>>>
>>>
>>> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <andreas.grammenos@gmail.com
>>> > wrote:
>>>
>>>> Yes, unless I am missing something... try it and if you have any more
>>>> problems drop an email.
>>>>
>>>> Regards.
>>>>
>>>> Kindly yours,
>>>>
>>>> Andrew Grammenos
>>>>
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>
>>>> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <al...@gmail.com> wrote:
>>>>
>>>>> WoW, that code seems to be exactly I want, will read through, double
>>>>> check, I will still need a partition aggregator to actually sorting after
>>>>> bucketization, right?
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Sure, I found this code useful to start with; he does bucketization
>>>>> for timed intervals in this gist
>>>>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>>>>>
>>>>> Hope this helps.
>>>>>
>>>>> Kindly yours,
>>>>>
>>>>> Andrew Grammenos
>>>>>
>>>>> -- PGP PKey --
>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>>
>>>>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> All right, will do trident instead, shameless to ask again, any
>>>>>> example code (particularly for events time sorting) to study?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <an...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> Well, if you need to just preserve the order of received (event)
>>>>>> tuples then why not use trident instead? Trident ensures correct ordering
>>>>>> (chronologically) as well as exactly once processing without any gimmicks;
>>>>>> sorting it secondary to the event generation sounds like you will enter
>>>>>> into quite a bit of hassle for no reason.
>>>>>>
>>>>>> Regards.
>>>>>>
>>>>>> Kindly yours,
>>>>>>
>>>>>> Andrew Grammenos
>>>>>>
>>>>>> -- PGP PKey --
>>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>>>
>>>>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <al...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>>>>>>
>>>>>>>
>>>>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> What do you mean by that? It's a bit vague as timestamps can have
>>>>>>> quite high resolution (like for example minutes, seconds, msec) so you will
>>>>>>> probably have to do a bit of bucketization before sorting them.... then by
>>>>>>> using a partition aggregator (in Trident at least) you can to this very
>>>>>>> easily.
>>>>>>> ​​
>>>>>>> Hope this helps.
>>>>>>>
>>>>>>> Kindly yours,
>>>>>>>
>>>>>>> Andrew Grammenos
>>>>>>>
>>>>>>> -- PGP PKey --
>>>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>>>>
>>>>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi, all
>>>>>>>>
>>>>>>>> is there any sample codes to sort the events in terms of the
>>>>>>>> timestamps field of a tuple?
>>>>>>>>
>>>>>>>> thanks
>>>>>>>>
>>>>>>>>
>>>>>>>> AL
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
>

Re: sorting events by timestamp

Posted by Alec Lee <al...@gmail.com>.
For dealing with such type of problem, seems trident is better than spout+bolts even latter is easier to understand and learn?

AL
> On Aug 25, 2015, at 9:31 PM, Kishore Senji <ks...@gmail.com> wrote:
> 
> Agreed. This makes sense if the aggregation is on fields etc. 
> 
> Although Alec did not mention it in this post, based on his previous posts on the same topic, I would assume he is trying to  sort the events because he wanted to "fill in" the missing events (smoothening the curve so to speak) by looking at the previous and next events of the missed timestamp and then do some stream processing on top of it (like for example alerting based on sliding window). Assuming that is the scenario, I guess then he would have to keep more metadata in the State so that he can fill in those events but the question would be when would he stop looking for missing events and fill them and move on (as they can come in different batches), plus he would have to do some stream processing (or store them to ES for later search for example) in the State itself if there is any such processing. This is where I think it gets tricky to do this in the partition aggregator.
> 
> So in our earlier posts we suggested he can do the the appropriate partitioning in Kafka (so that events from a given device ends up in the same partition) and he could do the window based sorting (by buffering few events) in the Stream processing.
> 
> Alec, Please ignore the above if my assumption is not correct.
> 
> 
> On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
> This is not an issue, as that probably would be done through a partition aggregator after the groupBy.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <ksenji@gmail.com <ma...@gmail.com>> wrote:
> Interesting. But wouldn't this be impacted by the trident batch size?
> 
> Assuming the batch boundary is like below, after bucketing you would groupBy on the start time (but how would you sort it?) and assumed it can be sorted, we should be done with that batch. so if the batch boundary is like below, you would end up with two different sets of sorts for events which are supposed to be together (12:44, 12:45 & 12:46 below). If I understand the original question, it is how to sort the full stream of events irrespective of how they are processed in batches.
> 
> 2013-03-22 12:43:00-07:00
> 2013-03-22 12:44:00-07:00
> 2013-03-22 12:45:00-07:00
> 2013-03-22 12:49:00-07:00
> 2013-03-22 12:47:00-07:00
> --------------------------------------
> 2013-03-22 12:48:00-07:00
> 2013-03-22 12:46:00-07:00
> 2013-03-22 12:51:00-07:00
> 2013-03-22 12:50:00-07:00
> 2013-03-22 12:52:00-07:00
>  
> 
> 
> 
> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
> Yes, unless I am missing something... try it and if you have any more problems drop an email.
> 
> Regards.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
> WoW, that code seems to be exactly I want, will read through, double check, I will still need a partition aggregator to actually sorting after bucketization, right?
> 
> thanks
> 
> 
>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Sure, I found this code useful to start with; he does bucketization for timed intervals in this gist <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>> 
>> Hope this helps.
>> 
>> Kindly yours,
>> 
>> Andrew Grammenos
>> 
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>> 
>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>> All right, will do trident instead, shameless to ask again, any example code (particularly for events time sorting) to study? 
>> 
>> thanks
>> 
>> 
>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Well, if you need to just preserve the order of received (event) tuples then why not use trident instead? Trident ensures correct ordering (chronologically) as well as exactly once processing without any gimmicks; sorting it secondary to the event generation sounds like you will enter into quite a bit of hassle for no reason.
>>> 
>>> Regards.
>>> 
>>> Kindly yours,
>>> 
>>> Andrew Grammenos
>>> 
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>> 
>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>> 
>>> 
>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> What do you mean by that? It's a bit vague as timestamps can have quite high resolution (like for example minutes, seconds, msec) so you will probably have to do a bit of bucketization before sorting them.... then by using a partition aggregator (in Trident at least) you can to this very easily.
>>>> ​​
>>>> Hope this helps.
>>>> 
>>>> Kindly yours,
>>>> 
>>>> Andrew Grammenos
>>>> 
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>> 
>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>>> Hi, all
>>>> 
>>>> is there any sample codes to sort the events in terms of the timestamps field of a tuple?
>>>> 
>>>> thanks
>>>> 
>>>> 
>>>> AL
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> 
> 


Re: sorting events by timestamp

Posted by Alec Lee <al...@gmail.com>.
Thanks Andrew & Kishore 

What you assumed were right, here is the details, I am in the process to building a real-time data process pipeline, I like to use kafka+storm. Currently, we don’t have a API to stream the data in, I use python-kafka to write a producer to pull out 1000000 rows from postgresDB and send_messages(self.topic, tup[0]) to kafka, I tried to make things simple, all 1000000 records go to same topic 

bin/kafka-topics.sh --describe --zookeeper localhost:2181 --topic test-topic
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/local/kafka/kafka-0.8.2.1-src/core/build/dependant-libs-2.10.4/slf4j-log4j12-1.6.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/local/kafka/kafka-0.8.2.1-src/core/build/dependant-libs-2.10.4/slf4j-log4j12-1.7.6.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Topic:test-topic        PartitionCount:1        ReplicationFactor:1     Configs:
        Topic: test-topic       Partition: 0    Leader: 1       Replicas: 1     Isr: 1

I didn’t make partitions, no replications, (should I do thing this way?) When data was pushed into kafka, it is clearly been shuffled, so when I print the timestamp in storm bolt, it is like this
2013-03-21 12:01:00-07:00
12445 [ProcessThread(sid:0 cport:-1):] INFO  org.apache.storm.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x14f6af9bafd000f type:create cxid:0x3 zxid:0x27 txntype:-1 reqpath:n/a Error Path:/test-topic/e803d8e2-500c-413a-8b3a-0273e94840aa Error:KeeperErrorCode = NoNode for /test-topic/e803d8e2-500c-413a-8b3a-0273e94840aa
2013-03-21 12:59:00-07:00
2013-03-21 12:58:00-07:00
2013-03-21 12:21:00-07:00
2013-03-21 13:00:00-07:00
2013-03-21 11:47:00-07:00
2013-03-21 13:01:00-07:00
2013-03-21 11:52:00-07:00
2013-03-21 12:02:00-07:00
2013-03-21 13:02:00-07:00
2013-03-21 12:22:00-07:00
2013-03-21 12:23:00-07:00
2013-03-21 13:03:00-07:00
2013-03-21 13:05:00-07:00
2013-03-21 13:04:00-07:00
2013-03-21 12:03:00-07:00
2013-03-21 12:24:00-07:00
2013-03-21 13:07:00-07:00
2013-03-21 13:06:00-07:00
2013-03-21 12:25:00-07:00
2013-03-21 13:08:00-07:00
2013-03-21 13:09:00-07:00
2013-03-21 11:53:00-07:00
2013-03-21 12:26:00-07:00
2013-03-21 12:04:00-07:00

As Kishore said, I need to do group by first, say fieldsGrouping(“measurement”, “sensor_id”, 10), so all tuples with same sensor_id will go to same tasks, so I assume I wouldn’t worry about groupby here, but as we discussed, I want to sort the measurement data from same device based on timestamps associated, therefore, I will be able to know if some missed minutes data exist or not, if there are missing minutes, I will fill in them by compute the average of last 2 events. It seems the timestamps are completely out of order, I never use trident partition aggregator, I am not sure if it can take on sorting job, but my concern if this type of sorting requires how much metadata in State, also batch issue makes me confused too, :(


Thanks

AL


> On Aug 25, 2015, at 9:31 PM, Kishore Senji <ks...@gmail.com> wrote:
> 
> Agreed. This makes sense if the aggregation is on fields etc. 
> 
> Although Alec did not mention it in this post, based on his previous posts on the same topic, I would assume he is trying to  sort the events because he wanted to "fill in" the missing events (smoothening the curve so to speak) by looking at the previous and next events of the missed timestamp and then do some stream processing on top of it (like for example alerting based on sliding window). Assuming that is the scenario, I guess then he would have to keep more metadata in the State so that he can fill in those events but the question would be when would he stop looking for missing events and fill them and move on (as they can come in different batches), plus he would have to do some stream processing (or store them to ES for later search for example) in the State itself if there is any such processing. This is where I think it gets tricky to do this in the partition aggregator.
> 
> So in our earlier posts we suggested he can do the the appropriate partitioning in Kafka (so that events from a given device ends up in the same partition) and he could do the window based sorting (by buffering few events) in the Stream processing.
> 
> Alec, Please ignore the above if my assumption is not correct.
> 
> 
> On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
> This is not an issue, as that probably would be done through a partition aggregator after the groupBy.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <ksenji@gmail.com <ma...@gmail.com>> wrote:
> Interesting. But wouldn't this be impacted by the trident batch size?
> 
> Assuming the batch boundary is like below, after bucketing you would groupBy on the start time (but how would you sort it?) and assumed it can be sorted, we should be done with that batch. so if the batch boundary is like below, you would end up with two different sets of sorts for events which are supposed to be together (12:44, 12:45 & 12:46 below). If I understand the original question, it is how to sort the full stream of events irrespective of how they are processed in batches.
> 
> 2013-03-22 12:43:00-07:00
> 2013-03-22 12:44:00-07:00
> 2013-03-22 12:45:00-07:00
> 2013-03-22 12:49:00-07:00
> 2013-03-22 12:47:00-07:00
> --------------------------------------
> 2013-03-22 12:48:00-07:00
> 2013-03-22 12:46:00-07:00
> 2013-03-22 12:51:00-07:00
> 2013-03-22 12:50:00-07:00
> 2013-03-22 12:52:00-07:00
>  
> 
> 
> 
> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
> Yes, unless I am missing something... try it and if you have any more problems drop an email.
> 
> Regards.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
> WoW, that code seems to be exactly I want, will read through, double check, I will still need a partition aggregator to actually sorting after bucketization, right?
> 
> thanks
> 
> 
>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Sure, I found this code useful to start with; he does bucketization for timed intervals in this gist <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>> 
>> Hope this helps.
>> 
>> Kindly yours,
>> 
>> Andrew Grammenos
>> 
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>> 
>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>> All right, will do trident instead, shameless to ask again, any example code (particularly for events time sorting) to study? 
>> 
>> thanks
>> 
>> 
>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> Well, if you need to just preserve the order of received (event) tuples then why not use trident instead? Trident ensures correct ordering (chronologically) as well as exactly once processing without any gimmicks; sorting it secondary to the event generation sounds like you will enter into quite a bit of hassle for no reason.
>>> 
>>> Regards.
>>> 
>>> Kindly yours,
>>> 
>>> Andrew Grammenos
>>> 
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>> 
>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>> 
>>> 
>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>>> 
>>>> What do you mean by that? It's a bit vague as timestamps can have quite high resolution (like for example minutes, seconds, msec) so you will probably have to do a bit of bucketization before sorting them.... then by using a partition aggregator (in Trident at least) you can to this very easily.
>>>> ​​
>>>> Hope this helps.
>>>> 
>>>> Kindly yours,
>>>> 
>>>> Andrew Grammenos
>>>> 
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>> 
>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>>> Hi, all
>>>> 
>>>> is there any sample codes to sort the events in terms of the timestamps field of a tuple?
>>>> 
>>>> thanks
>>>> 
>>>> 
>>>> AL
>>>> 
>>> 
>>> 
>> 
>> 
> 
> 
> 
> 
> 


Re: sorting events by timestamp

Posted by Kishore Senji <ks...@gmail.com>.
Agreed. This makes sense if the aggregation is on fields etc.

Although Alec did not mention it in this post, based on his previous posts
on the same topic, I would assume he is trying to  sort the events because
he wanted to "fill in" the missing events (smoothening the curve so to
speak) by looking at the previous and next events of the missed timestamp
and then do some stream processing on top of it (like for example alerting
based on sliding window). Assuming that is the scenario, I guess then he
would have to keep more metadata in the State so that he can fill in those
events but the question would be when would he stop looking for missing
events and fill them and move on (as they can come in different batches),
plus he would have to do some stream processing (or store them to ES for
later search for example) in the State itself if there is any such
processing. This is where I think it gets tricky to do this in the
partition aggregator.

So in our earlier posts we suggested he can do the the appropriate
partitioning in Kafka (so that events from a given device ends up in the
same partition) and he could do the window based sorting (by buffering few
events) in the Stream processing.

Alec, Please ignore the above if my assumption is not correct.


On Tue, Aug 25, 2015 at 6:19 PM, Andrew Xor <an...@gmail.com>
wrote:

> This is not an issue, as that probably would be done through a partition
> aggregator after the groupBy.
>
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>
> On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <ks...@gmail.com> wrote:
>
>> Interesting. But wouldn't this be impacted by the trident batch size?
>>
>> Assuming the batch boundary is like below, after bucketing you would
>> groupBy on the start time (but how would you sort it?) and assumed it can
>> be sorted, we should be done with that batch. so if the batch boundary is
>> like below, you would end up with two different sets of sorts for events
>> which are supposed to be together (12:44, 12:45 & 12:46 below). If I
>> understand the original question, it is how to sort the full stream of
>> events irrespective of how they are processed in batches.
>>
>> 2013-03-22 12:43:00-07:00
>> 2013-03-22 12:44:00-07:00
>> 2013-03-22 12:45:00-07:00
>> 2013-03-22 12:49:00-07:00
>> 2013-03-22 12:47:00-07:00
>> --------------------------------------
>> 2013-03-22 12:48:00-07:00
>> 2013-03-22 12:46:00-07:00
>> 2013-03-22 12:51:00-07:00
>> 2013-03-22 12:50:00-07:00
>> 2013-03-22 12:52:00-07:00
>>
>>
>>
>>
>> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <an...@gmail.com>
>> wrote:
>>
>>> Yes, unless I am missing something... try it and if you have any more
>>> problems drop an email.
>>>
>>> Regards.
>>>
>>> Kindly yours,
>>>
>>> Andrew Grammenos
>>>
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>
>>> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <al...@gmail.com> wrote:
>>>
>>>> WoW, that code seems to be exactly I want, will read through, double
>>>> check, I will still need a partition aggregator to actually sorting after
>>>> bucketization, right?
>>>>
>>>> thanks
>>>>
>>>>
>>>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <an...@gmail.com>
>>>> wrote:
>>>>
>>>> Sure, I found this code useful to start with; he does bucketization for
>>>> timed intervals in this gist
>>>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>>>>
>>>> Hope this helps.
>>>>
>>>> Kindly yours,
>>>>
>>>> Andrew Grammenos
>>>>
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>
>>>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <al...@gmail.com> wrote:
>>>>
>>>>> All right, will do trident instead, shameless to ask again, any
>>>>> example code (particularly for events time sorting) to study?
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> Well, if you need to just preserve the order of received (event)
>>>>> tuples then why not use trident instead? Trident ensures correct ordering
>>>>> (chronologically) as well as exactly once processing without any gimmicks;
>>>>> sorting it secondary to the event generation sounds like you will enter
>>>>> into quite a bit of hassle for no reason.
>>>>>
>>>>> Regards.
>>>>>
>>>>> Kindly yours,
>>>>>
>>>>> Andrew Grammenos
>>>>>
>>>>> -- PGP PKey --
>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>>
>>>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>>>>>
>>>>>>
>>>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> What do you mean by that? It's a bit vague as timestamps can have
>>>>>> quite high resolution (like for example minutes, seconds, msec) so you will
>>>>>> probably have to do a bit of bucketization before sorting them.... then by
>>>>>> using a partition aggregator (in Trident at least) you can to this very
>>>>>> easily.
>>>>>> ​​
>>>>>> Hope this helps.
>>>>>>
>>>>>> Kindly yours,
>>>>>>
>>>>>> Andrew Grammenos
>>>>>>
>>>>>> -- PGP PKey --
>>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>>>
>>>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi, all
>>>>>>>
>>>>>>> is there any sample codes to sort the events in terms of the
>>>>>>> timestamps field of a tuple?
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>>
>>>>>>> AL
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

Re: sorting events by timestamp

Posted by Andrew Xor <an...@gmail.com>.
This is not an issue, as that probably would be done through a partition
aggregator after the groupBy.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
<https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>

On Wed, Aug 26, 2015 at 4:16 AM, Kishore Senji <ks...@gmail.com> wrote:

> Interesting. But wouldn't this be impacted by the trident batch size?
>
> Assuming the batch boundary is like below, after bucketing you would
> groupBy on the start time (but how would you sort it?) and assumed it can
> be sorted, we should be done with that batch. so if the batch boundary is
> like below, you would end up with two different sets of sorts for events
> which are supposed to be together (12:44, 12:45 & 12:46 below). If I
> understand the original question, it is how to sort the full stream of
> events irrespective of how they are processed in batches.
>
> 2013-03-22 12:43:00-07:00
> 2013-03-22 12:44:00-07:00
> 2013-03-22 12:45:00-07:00
> 2013-03-22 12:49:00-07:00
> 2013-03-22 12:47:00-07:00
> --------------------------------------
> 2013-03-22 12:48:00-07:00
> 2013-03-22 12:46:00-07:00
> 2013-03-22 12:51:00-07:00
> 2013-03-22 12:50:00-07:00
> 2013-03-22 12:52:00-07:00
>
>
>
>
> On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <an...@gmail.com>
> wrote:
>
>> Yes, unless I am missing something... try it and if you have any more
>> problems drop an email.
>>
>> Regards.
>>
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>
>> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <al...@gmail.com> wrote:
>>
>>> WoW, that code seems to be exactly I want, will read through, double
>>> check, I will still need a partition aggregator to actually sorting after
>>> bucketization, right?
>>>
>>> thanks
>>>
>>>
>>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <an...@gmail.com>
>>> wrote:
>>>
>>> Sure, I found this code useful to start with; he does bucketization for
>>> timed intervals in this gist
>>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>>>
>>> Hope this helps.
>>>
>>> Kindly yours,
>>>
>>> Andrew Grammenos
>>>
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>
>>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <al...@gmail.com> wrote:
>>>
>>>> All right, will do trident instead, shameless to ask again, any example
>>>> code (particularly for events time sorting) to study?
>>>>
>>>> thanks
>>>>
>>>>
>>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <an...@gmail.com>
>>>> wrote:
>>>>
>>>> Well, if you need to just preserve the order of received (event) tuples
>>>> then why not use trident instead? Trident ensures correct ordering
>>>> (chronologically) as well as exactly once processing without any gimmicks;
>>>> sorting it secondary to the event generation sounds like you will enter
>>>> into quite a bit of hassle for no reason.
>>>>
>>>> Regards.
>>>>
>>>> Kindly yours,
>>>>
>>>> Andrew Grammenos
>>>>
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>
>>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <al...@gmail.com> wrote:
>>>>
>>>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>>>>
>>>>>
>>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com>
>>>>> wrote:
>>>>>
>>>>> What do you mean by that? It's a bit vague as timestamps can have
>>>>> quite high resolution (like for example minutes, seconds, msec) so you will
>>>>> probably have to do a bit of bucketization before sorting them.... then by
>>>>> using a partition aggregator (in Trident at least) you can to this very
>>>>> easily.
>>>>> ​​
>>>>> Hope this helps.
>>>>>
>>>>> Kindly yours,
>>>>>
>>>>> Andrew Grammenos
>>>>>
>>>>> -- PGP PKey --
>>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>>
>>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi, all
>>>>>>
>>>>>> is there any sample codes to sort the events in terms of the
>>>>>> timestamps field of a tuple?
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>
>>>>>> AL
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: sorting events by timestamp

Posted by Kishore Senji <ks...@gmail.com>.
Interesting. But wouldn't this be impacted by the trident batch size?

Assuming the batch boundary is like below, after bucketing you would
groupBy on the start time (but how would you sort it?) and assumed it can
be sorted, we should be done with that batch. so if the batch boundary is
like below, you would end up with two different sets of sorts for events
which are supposed to be together (12:44, 12:45 & 12:46 below). If I
understand the original question, it is how to sort the full stream of
events irrespective of how they are processed in batches.

2013-03-22 12:43:00-07:00
2013-03-22 12:44:00-07:00
2013-03-22 12:45:00-07:00
2013-03-22 12:49:00-07:00
2013-03-22 12:47:00-07:00
--------------------------------------
2013-03-22 12:48:00-07:00
2013-03-22 12:46:00-07:00
2013-03-22 12:51:00-07:00
2013-03-22 12:50:00-07:00
2013-03-22 12:52:00-07:00




On Tue, Aug 25, 2015 at 4:58 PM, Andrew Xor <an...@gmail.com>
wrote:

> Yes, unless I am missing something... try it and if you have any more
> problems drop an email.
>
> Regards.
>
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>
> On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <al...@gmail.com> wrote:
>
>> WoW, that code seems to be exactly I want, will read through, double
>> check, I will still need a partition aggregator to actually sorting after
>> bucketization, right?
>>
>> thanks
>>
>>
>> On Aug 25, 2015, at 4:40 PM, Andrew Xor <an...@gmail.com>
>> wrote:
>>
>> Sure, I found this code useful to start with; he does bucketization for
>> timed intervals in this gist
>> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>>
>> Hope this helps.
>>
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>
>> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <al...@gmail.com> wrote:
>>
>>> All right, will do trident instead, shameless to ask again, any example
>>> code (particularly for events time sorting) to study?
>>>
>>> thanks
>>>
>>>
>>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <an...@gmail.com>
>>> wrote:
>>>
>>> Well, if you need to just preserve the order of received (event) tuples
>>> then why not use trident instead? Trident ensures correct ordering
>>> (chronologically) as well as exactly once processing without any gimmicks;
>>> sorting it secondary to the event generation sounds like you will enter
>>> into quite a bit of hassle for no reason.
>>>
>>> Regards.
>>>
>>> Kindly yours,
>>>
>>> Andrew Grammenos
>>>
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>
>>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <al...@gmail.com> wrote:
>>>
>>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>>>
>>>>
>>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com>
>>>> wrote:
>>>>
>>>> What do you mean by that? It's a bit vague as timestamps can have quite
>>>> high resolution (like for example minutes, seconds, msec) so you will
>>>> probably have to do a bit of bucketization before sorting them.... then by
>>>> using a partition aggregator (in Trident at least) you can to this very
>>>> easily.
>>>> ​​
>>>> Hope this helps.
>>>>
>>>> Kindly yours,
>>>>
>>>> Andrew Grammenos
>>>>
>>>> -- PGP PKey --
>>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>>
>>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com> wrote:
>>>>
>>>>> Hi, all
>>>>>
>>>>> is there any sample codes to sort the events in terms of the
>>>>> timestamps field of a tuple?
>>>>>
>>>>> thanks
>>>>>
>>>>>
>>>>> AL
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>

Re: sorting events by timestamp

Posted by Andrew Xor <an...@gmail.com>.
Yes, unless I am missing something... try it and if you have any more
problems drop an email.

Regards.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
<https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>

On Wed, Aug 26, 2015 at 2:46 AM, Alec Lee <al...@gmail.com> wrote:

> WoW, that code seems to be exactly I want, will read through, double
> check, I will still need a partition aggregator to actually sorting after
> bucketization, right?
>
> thanks
>
>
> On Aug 25, 2015, at 4:40 PM, Andrew Xor <an...@gmail.com>
> wrote:
>
> Sure, I found this code useful to start with; he does bucketization for
> timed intervals in this gist
> <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
>
> Hope this helps.
>
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>
> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <al...@gmail.com> wrote:
>
>> All right, will do trident instead, shameless to ask again, any example
>> code (particularly for events time sorting) to study?
>>
>> thanks
>>
>>
>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <an...@gmail.com>
>> wrote:
>>
>> Well, if you need to just preserve the order of received (event) tuples
>> then why not use trident instead? Trident ensures correct ordering
>> (chronologically) as well as exactly once processing without any gimmicks;
>> sorting it secondary to the event generation sounds like you will enter
>> into quite a bit of hassle for no reason.
>>
>> Regards.
>>
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>
>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <al...@gmail.com> wrote:
>>
>>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>>
>>>
>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com>
>>> wrote:
>>>
>>> What do you mean by that? It's a bit vague as timestamps can have quite
>>> high resolution (like for example minutes, seconds, msec) so you will
>>> probably have to do a bit of bucketization before sorting them.... then by
>>> using a partition aggregator (in Trident at least) you can to this very
>>> easily.
>>> ​​
>>> Hope this helps.
>>>
>>> Kindly yours,
>>>
>>> Andrew Grammenos
>>>
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>>
>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com> wrote:
>>>
>>>> Hi, all
>>>>
>>>> is there any sample codes to sort the events in terms of the timestamps
>>>> field of a tuple?
>>>>
>>>> thanks
>>>>
>>>>
>>>> AL
>>>
>>>
>>>
>>>
>>
>>
>
>

Re: sorting events by timestamp

Posted by Alec Lee <al...@gmail.com>.
WoW, that code seems to be exactly I want, will read through, double check, I will still need a partition aggregator to actually sorting after bucketization, right?

thanks


> On Aug 25, 2015, at 4:40 PM, Andrew Xor <an...@gmail.com> wrote:
> 
> Sure, I found this code useful to start with; he does bucketization for timed intervals in this gist <https://gist.github.com/codyaray/75533044fc8c0a12fa67>.
> 
> Hope this helps.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
> All right, will do trident instead, shameless to ask again, any example code (particularly for events time sorting) to study? 
> 
> thanks
> 
> 
>> On Aug 25, 2015, at 4:31 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>> 
>> Well, if you need to just preserve the order of received (event) tuples then why not use trident instead? Trident ensures correct ordering (chronologically) as well as exactly once processing without any gimmicks; sorting it secondary to the event generation sounds like you will enter into quite a bit of hassle for no reason.
>> 
>> Regards.
>> 
>> Kindly yours,
>> 
>> Andrew Grammenos
>> 
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>> 
>> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>> BTW, I am using spout and bolts, currently not using trident. Thanks
>> 
>> 
>>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>>> 
>>> What do you mean by that? It's a bit vague as timestamps can have quite high resolution (like for example minutes, seconds, msec) so you will probably have to do a bit of bucketization before sorting them.... then by using a partition aggregator (in Trident at least) you can to this very easily.
>>> ​​
>>> Hope this helps.
>>> 
>>> Kindly yours,
>>> 
>>> Andrew Grammenos
>>> 
>>> -- PGP PKey --
>>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>> 
>>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>>> Hi, all
>>> 
>>> is there any sample codes to sort the events in terms of the timestamps field of a tuple?
>>> 
>>> thanks
>>> 
>>> 
>>> AL
>>> 
>> 
>> 
> 
> 


Re: sorting events by timestamp

Posted by Andrew Xor <an...@gmail.com>.
Sure, I found this code useful to start with; he does bucketization for
timed intervals in this gist
<https://gist.github.com/codyaray/75533044fc8c0a12fa67>.

Hope this helps.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
<https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>

On Wed, Aug 26, 2015 at 2:36 AM, Alec Lee <al...@gmail.com> wrote:

> All right, will do trident instead, shameless to ask again, any example
> code (particularly for events time sorting) to study?
>
> thanks
>
>
> On Aug 25, 2015, at 4:31 PM, Andrew Xor <an...@gmail.com>
> wrote:
>
> Well, if you need to just preserve the order of received (event) tuples
> then why not use trident instead? Trident ensures correct ordering
> (chronologically) as well as exactly once processing without any gimmicks;
> sorting it secondary to the event generation sounds like you will enter
> into quite a bit of hassle for no reason.
>
> Regards.
>
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>
> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <al...@gmail.com> wrote:
>
>> BTW, I am using spout and bolts, currently not using trident. Thanks
>>
>>
>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com>
>> wrote:
>>
>> What do you mean by that? It's a bit vague as timestamps can have quite
>> high resolution (like for example minutes, seconds, msec) so you will
>> probably have to do a bit of bucketization before sorting them.... then by
>> using a partition aggregator (in Trident at least) you can to this very
>> easily.
>> ​​
>> Hope this helps.
>>
>> Kindly yours,
>>
>> Andrew Grammenos
>>
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
>> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>>
>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com> wrote:
>>
>>> Hi, all
>>>
>>> is there any sample codes to sort the events in terms of the timestamps
>>> field of a tuple?
>>>
>>> thanks
>>>
>>>
>>> AL
>>
>>
>>
>>
>
>

Re: sorting events by timestamp

Posted by Alec Lee <al...@gmail.com>.
All right, will do trident instead, shameless to ask again, any example code (particularly for events time sorting) to study? 

thanks


> On Aug 25, 2015, at 4:31 PM, Andrew Xor <an...@gmail.com> wrote:
> 
> Well, if you need to just preserve the order of received (event) tuples then why not use trident instead? Trident ensures correct ordering (chronologically) as well as exactly once processing without any gimmicks; sorting it secondary to the event generation sounds like you will enter into quite a bit of hassle for no reason.
> 
> Regards.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
> BTW, I am using spout and bolts, currently not using trident. Thanks
> 
> 
>> On Aug 25, 2015, at 3:47 PM, Andrew Xor <andreas.grammenos@gmail.com <ma...@gmail.com>> wrote:
>> 
>> What do you mean by that? It's a bit vague as timestamps can have quite high resolution (like for example minutes, seconds, msec) so you will probably have to do a bit of bucketization before sorting them.... then by using a partition aggregator (in Trident at least) you can to this very easily.
>> ​​
>> Hope this helps.
>> 
>> Kindly yours,
>> 
>> Andrew Grammenos
>> 
>> -- PGP PKey --
>> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
>> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>> 
>> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
>> Hi, all
>> 
>> is there any sample codes to sort the events in terms of the timestamps field of a tuple?
>> 
>> thanks
>> 
>> 
>> AL
>> 
> 
> 


Re: sorting events by timestamp

Posted by Andrew Xor <an...@gmail.com>.
Well, if you need to just preserve the order of received (event) tuples
then why not use trident instead? Trident ensures correct ordering
(chronologically) as well as exactly once processing without any gimmicks;
sorting it secondary to the event generation sounds like you will enter
into quite a bit of hassle for no reason.

Regards.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
<https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>

On Wed, Aug 26, 2015 at 2:00 AM, Alec Lee <al...@gmail.com> wrote:

> BTW, I am using spout and bolts, currently not using trident. Thanks
>
>
> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com>
> wrote:
>
> What do you mean by that? It's a bit vague as timestamps can have quite
> high resolution (like for example minutes, seconds, msec) so you will
> probably have to do a bit of bucketization before sorting them.... then by
> using a partition aggregator (in Trident at least) you can to this very
> easily.
> ​​
> Hope this helps.
>
> Kindly yours,
>
> Andrew Grammenos
>
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
> <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
>
> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com> wrote:
>
>> Hi, all
>>
>> is there any sample codes to sort the events in terms of the timestamps
>> field of a tuple?
>>
>> thanks
>>
>>
>> AL
>
>
>
>

Re: sorting events by timestamp

Posted by Alec Lee <al...@gmail.com>.
BTW, I am using spout and bolts, currently not using trident. Thanks


> On Aug 25, 2015, at 3:47 PM, Andrew Xor <an...@gmail.com> wrote:
> 
> What do you mean by that? It's a bit vague as timestamps can have quite high resolution (like for example minutes, seconds, msec) so you will probably have to do a bit of bucketization before sorting them.... then by using a partition aggregator (in Trident at least) you can to this very easily.
> ​​
> Hope this helps.
> 
> Kindly yours,
> 
> Andrew Grammenos
> 
> -- PGP PKey --
> ​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
> https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt <https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>
> 
> On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <alec.in.bc@gmail.com <ma...@gmail.com>> wrote:
> Hi, all
> 
> is there any sample codes to sort the events in terms of the timestamps field of a tuple?
> 
> thanks
> 
> 
> AL
> 


Re: sorting events by timestamp

Posted by Andrew Xor <an...@gmail.com>.
What do you mean by that? It's a bit vague as timestamps can have quite
high resolution (like for example minutes, seconds, msec) so you will
probably have to do a bit of bucketization before sorting them.... then by
using a partition aggregator (in Trident at least) you can to this very
easily.
​​
Hope this helps.

Kindly yours,

Andrew Grammenos

-- PGP PKey --
​ <https://www.dropbox.com/s/2kcxe59zsi9nrdt/pgpsig.txt>
https://www.dropbox.com/s/yxvycjvlsc111bh/pgpsig.txt
<https://www.dropbox.com/s/ei2nqsen641daei/pgpsig.txt>

On Wed, Aug 26, 2015 at 1:37 AM, Alec Lee <al...@gmail.com> wrote:

> Hi, all
>
> is there any sample codes to sort the events in terms of the timestamps
> field of a tuple?
>
> thanks
>
>
> AL