You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon (JIRA)" <ji...@apache.org> on 2014/04/03 03:43:15 UTC

[jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

    [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430 ] 

Edward J. Yoon commented on HAMA-883:
-------------------------------------

NOTE: my fellow worker is currently working on this issue - https://github.com/garudakang/meerkat

> [Research Task] Massive log event aggregation in real time using Apache Hama
> ----------------------------------------------------------------------------
>
>                 Key: HAMA-883
>                 URL: https://issues.apache.org/jira/browse/HAMA-883
>             Project: Hama
>          Issue Type: Task
>            Reporter: Edward J. Yoon
>
> BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by "Edward J. Yoon" <ed...@apache.org>.
No .. Please read my mail again. One task creates the topology map and
broadcast to all peers at first super step.

MapWritable<GroupName, List<HostName>> topology;
..

On Sat, Apr 12, 2014 at 3:16 AM, Chia-Hung Lin <cl...@googlemail.com> wrote:
> No problem. It's a good discussion so we can examine and improve accordingly.
>
> I am still not very sure about the topology, or how tasks are grouped.
> From description, it seems looks as the link below:
>
> http://i.imgur.com/92L2XY1.png
>
> Each GroomServer is viewed as a group, and each group will launch 3
> tasks by default (as default xml defined). So the corresponded
> messages, emitted from source like queue, is sent to each group for
> consumption? And how do task communicate between groups/ tasks?
>
>
>
>
> On 11 April 2014 16:43, Edward J. Yoon <ed...@datasayer.com> wrote:
>> My rough idea assumes that dedicated Hama is installed on machines that
>> generates logs, and the number of child tasks will be launched equally per
>> GroomServer. So, if the groups == 3, framework launches 3 tasks per node.
>> At first superstep, one task broadcasts the Topology after grouping the
>> Tasks into 3 groups.
>>
>> == Group1 ==
>> server1:60001
>> server2:60001
>> server3:60001
>>
>> == Group2 ==
>> server1:60002
>> server2:60002
>> server3:60002
>>
>> == Group3 ==
>> server1:60003
>> server2:60003
>> server3:60003
>>
>> Based on this Topolgy, tasks reflects proper class and executes it. Then,
>> it'll work like Storm flow. I didn't think about FT issue yet. :-)
>>
>>
>>
>> On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <cl...@googlemail.com>wrote:
>>
>>> Or we can have POC first and then see how it relates to the issue we
>>> might need to fix.
>>>
>>> On 11 April 2014 16:10, Chia-Hung Lin <cl...@googlemail.com> wrote:
>>> > In that case are we going to organize multiple tasks into a group? A
>>> > job has N bsp groups (bsp task in current code), in turn each group
>>> > contain multiple tasks (and all tasks are on the same server)?
>>> >
>>> > If this is the case, how do they send messages or communicate between
>>> > groups? group to group? A task (within a group) can arbitrary send the
>>> > messages?
>>> >
>>> > I have this question because this would have implication on FT. IIRC
>>> > Storm is a CEP framework, and messages can be sent arbitrary to every
>>> > bolt. The issue with such computation is that it's not a simple task
>>> > when performing checkpoint. Generally it's done through communication
>>> > induced checkpointing. Otherwise like storm they ack and redo each
>>> > message when necessary; an option is something like batch (in storm
>>> > like trident batch processing if I am correct) transactional
>>> > processing.
>>> >
>>> > What I can think of right now is, with current structure, grouping
>>> > every N messages a superstep, and then asynchronously checkpointing,
>>> > which may be similar to trident batch processing.
>>> >
>>> > I understand it's still far away based on the current status. I
>>> > suppose it's good if we can take that into consideration beforehand as
>>> > well.
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On 11 April 2014 13:40, Edward J. Yoon <ed...@apache.org> wrote:
>>> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable
>>> >> bolts seems pretty nice (especially, chainable bolts can be really
>>> >> useful in case of real-time join operation).
>>> >>
>>> >> I think, we can also implement similar functions of Storm's task
>>> >> grouping and chainable bolts on BSP. My rough idea is:
>>> >>
>>> >> 1. Launches multi-tasks per node (as number of group of Bolts). For
>>> example:
>>> >>
>>> >> +---------------+
>>> >> |    Server1    |
>>> >> +---------------+
>>> >> Task-1. tailing bolt
>>> >> Task-2. split sentence bolt
>>> >> Task-3. wordcount bolt
>>> >>
>>> >> 2. Assign the tasks to proper group.
>>> >> --
>>> >> 3. Each task executes their user-defined function and sends messages
>>> >> to task of next group.
>>> >> 4. Synchronizes all.
>>> >> --
>>> >> 5. Finally, repeat the above 3 ~ 4 process.
>>> >>
>>> >> In here, only the difficult one is how to determine the task group at
>>> >> initial superstep. So, I'd like to add below one to BSPPeer interface.
>>> >>
>>> >>   /**
>>> >>    * @return the names of locally adjacent peers (including this peer).
>>> >>    */
>>> >>   public String[] getAdjacentPeerNames();
>>> >>
>>> >>
>>> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <ye...@gmail.com>
>>> wrote:
>>> >>> great~
>>> >>>
>>> >>>
>>> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>>> >>>
>>> >>>>
>>> >>>>     [
>>> >>>>
>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430
>>> ]
>>> >>>>
>>> >>>> Edward J. Yoon commented on HAMA-883:
>>> >>>> -------------------------------------
>>> >>>>
>>> >>>> NOTE: my fellow worker is currently working on this issue -
>>> >>>> https://github.com/garudakang/meerkat
>>> >>>>
>>> >>>> > [Research Task] Massive log event aggregation in real time using
>>> Apache
>>> >>>> Hama
>>> >>>> >
>>> >>>>
>>> ----------------------------------------------------------------------------
>>> >>>> >
>>> >>>> >                 Key: HAMA-883
>>> >>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>>> >>>> >             Project: Hama
>>> >>>> >          Issue Type: Task
>>> >>>> >            Reporter: Edward J. Yoon
>>> >>>> >
>>> >>>> > BSP tasks can be used for aggregating log data streamed in real
>>> time.
>>> >>>> With this research task, we might able to platformization these kind
>>> of
>>> >>>> processing.
>>> >>>>
>>> >>>>
>>> >>>>
>>> >>>> --
>>> >>>> This message was sent by Atlassian JIRA
>>> >>>> (v6.2#6252)
>>> >>>>
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> ------
>>> >>> Yexi Jiang,
>>> >>> ECS 251,  yjian004@cs.fiu.edu
>>> >>> School of Computer and Information Science,
>>> >>> Florida International University
>>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Edward J. Yoon (@eddieyoon)
>>> >> Chief Executive Officer
>>> >> DataSayer Co., Ltd.
>>>
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer Co., Ltd.



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer Co., Ltd.

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Chia-Hung Lin <cl...@googlemail.com>.
No problem. It's a good discussion so we can examine and improve accordingly.

I am still not very sure about the topology, or how tasks are grouped.
>From description, it seems looks as the link below:

http://i.imgur.com/92L2XY1.png

Each GroomServer is viewed as a group, and each group will launch 3
tasks by default (as default xml defined). So the corresponded
messages, emitted from source like queue, is sent to each group for
consumption? And how do task communicate between groups/ tasks?




On 11 April 2014 16:43, Edward J. Yoon <ed...@datasayer.com> wrote:
> My rough idea assumes that dedicated Hama is installed on machines that
> generates logs, and the number of child tasks will be launched equally per
> GroomServer. So, if the groups == 3, framework launches 3 tasks per node.
> At first superstep, one task broadcasts the Topology after grouping the
> Tasks into 3 groups.
>
> == Group1 ==
> server1:60001
> server2:60001
> server3:60001
>
> == Group2 ==
> server1:60002
> server2:60002
> server3:60002
>
> == Group3 ==
> server1:60003
> server2:60003
> server3:60003
>
> Based on this Topolgy, tasks reflects proper class and executes it. Then,
> it'll work like Storm flow. I didn't think about FT issue yet. :-)
>
>
>
> On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <cl...@googlemail.com>wrote:
>
>> Or we can have POC first and then see how it relates to the issue we
>> might need to fix.
>>
>> On 11 April 2014 16:10, Chia-Hung Lin <cl...@googlemail.com> wrote:
>> > In that case are we going to organize multiple tasks into a group? A
>> > job has N bsp groups (bsp task in current code), in turn each group
>> > contain multiple tasks (and all tasks are on the same server)?
>> >
>> > If this is the case, how do they send messages or communicate between
>> > groups? group to group? A task (within a group) can arbitrary send the
>> > messages?
>> >
>> > I have this question because this would have implication on FT. IIRC
>> > Storm is a CEP framework, and messages can be sent arbitrary to every
>> > bolt. The issue with such computation is that it's not a simple task
>> > when performing checkpoint. Generally it's done through communication
>> > induced checkpointing. Otherwise like storm they ack and redo each
>> > message when necessary; an option is something like batch (in storm
>> > like trident batch processing if I am correct) transactional
>> > processing.
>> >
>> > What I can think of right now is, with current structure, grouping
>> > every N messages a superstep, and then asynchronously checkpointing,
>> > which may be similar to trident batch processing.
>> >
>> > I understand it's still far away based on the current status. I
>> > suppose it's good if we can take that into consideration beforehand as
>> > well.
>> >
>> >
>> >
>> >
>> >
>> > On 11 April 2014 13:40, Edward J. Yoon <ed...@apache.org> wrote:
>> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable
>> >> bolts seems pretty nice (especially, chainable bolts can be really
>> >> useful in case of real-time join operation).
>> >>
>> >> I think, we can also implement similar functions of Storm's task
>> >> grouping and chainable bolts on BSP. My rough idea is:
>> >>
>> >> 1. Launches multi-tasks per node (as number of group of Bolts). For
>> example:
>> >>
>> >> +---------------+
>> >> |    Server1    |
>> >> +---------------+
>> >> Task-1. tailing bolt
>> >> Task-2. split sentence bolt
>> >> Task-3. wordcount bolt
>> >>
>> >> 2. Assign the tasks to proper group.
>> >> --
>> >> 3. Each task executes their user-defined function and sends messages
>> >> to task of next group.
>> >> 4. Synchronizes all.
>> >> --
>> >> 5. Finally, repeat the above 3 ~ 4 process.
>> >>
>> >> In here, only the difficult one is how to determine the task group at
>> >> initial superstep. So, I'd like to add below one to BSPPeer interface.
>> >>
>> >>   /**
>> >>    * @return the names of locally adjacent peers (including this peer).
>> >>    */
>> >>   public String[] getAdjacentPeerNames();
>> >>
>> >>
>> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <ye...@gmail.com>
>> wrote:
>> >>> great~
>> >>>
>> >>>
>> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>> >>>
>> >>>>
>> >>>>     [
>> >>>>
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430
>> ]
>> >>>>
>> >>>> Edward J. Yoon commented on HAMA-883:
>> >>>> -------------------------------------
>> >>>>
>> >>>> NOTE: my fellow worker is currently working on this issue -
>> >>>> https://github.com/garudakang/meerkat
>> >>>>
>> >>>> > [Research Task] Massive log event aggregation in real time using
>> Apache
>> >>>> Hama
>> >>>> >
>> >>>>
>> ----------------------------------------------------------------------------
>> >>>> >
>> >>>> >                 Key: HAMA-883
>> >>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>> >>>> >             Project: Hama
>> >>>> >          Issue Type: Task
>> >>>> >            Reporter: Edward J. Yoon
>> >>>> >
>> >>>> > BSP tasks can be used for aggregating log data streamed in real
>> time.
>> >>>> With this research task, we might able to platformization these kind
>> of
>> >>>> processing.
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> This message was sent by Atlassian JIRA
>> >>>> (v6.2#6252)
>> >>>>
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> ------
>> >>> Yexi Jiang,
>> >>> ECS 251,  yjian004@cs.fiu.edu
>> >>> School of Computer and Information Science,
>> >>> Florida International University
>> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>
>> >>
>> >>
>> >> --
>> >> Edward J. Yoon (@eddieyoon)
>> >> Chief Executive Officer
>> >> DataSayer Co., Ltd.
>>
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer Co., Ltd.

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by "Edward J. Yoon" <ed...@datasayer.com>.
My rough idea assumes that dedicated Hama is installed on machines that
generates logs, and the number of child tasks will be launched equally per
GroomServer. So, if the groups == 3, framework launches 3 tasks per node.
At first superstep, one task broadcasts the Topology after grouping the
Tasks into 3 groups.

== Group1 ==
server1:60001
server2:60001
server3:60001

== Group2 ==
server1:60002
server2:60002
server3:60002

== Group3 ==
server1:60003
server2:60003
server3:60003

Based on this Topolgy, tasks reflects proper class and executes it. Then,
it'll work like Storm flow. I didn't think about FT issue yet. :-)



On Fri, Apr 11, 2014 at 5:12 PM, Chia-Hung Lin <cl...@googlemail.com>wrote:

> Or we can have POC first and then see how it relates to the issue we
> might need to fix.
>
> On 11 April 2014 16:10, Chia-Hung Lin <cl...@googlemail.com> wrote:
> > In that case are we going to organize multiple tasks into a group? A
> > job has N bsp groups (bsp task in current code), in turn each group
> > contain multiple tasks (and all tasks are on the same server)?
> >
> > If this is the case, how do they send messages or communicate between
> > groups? group to group? A task (within a group) can arbitrary send the
> > messages?
> >
> > I have this question because this would have implication on FT. IIRC
> > Storm is a CEP framework, and messages can be sent arbitrary to every
> > bolt. The issue with such computation is that it's not a simple task
> > when performing checkpoint. Generally it's done through communication
> > induced checkpointing. Otherwise like storm they ack and redo each
> > message when necessary; an option is something like batch (in storm
> > like trident batch processing if I am correct) transactional
> > processing.
> >
> > What I can think of right now is, with current structure, grouping
> > every N messages a superstep, and then asynchronously checkpointing,
> > which may be similar to trident batch processing.
> >
> > I understand it's still far away based on the current status. I
> > suppose it's good if we can take that into consideration beforehand as
> > well.
> >
> >
> >
> >
> >
> > On 11 April 2014 13:40, Edward J. Yoon <ed...@apache.org> wrote:
> >> Yesterday, I had survey the Storm. Storm's task grouping and chainable
> >> bolts seems pretty nice (especially, chainable bolts can be really
> >> useful in case of real-time join operation).
> >>
> >> I think, we can also implement similar functions of Storm's task
> >> grouping and chainable bolts on BSP. My rough idea is:
> >>
> >> 1. Launches multi-tasks per node (as number of group of Bolts). For
> example:
> >>
> >> +---------------+
> >> |    Server1    |
> >> +---------------+
> >> Task-1. tailing bolt
> >> Task-2. split sentence bolt
> >> Task-3. wordcount bolt
> >>
> >> 2. Assign the tasks to proper group.
> >> --
> >> 3. Each task executes their user-defined function and sends messages
> >> to task of next group.
> >> 4. Synchronizes all.
> >> --
> >> 5. Finally, repeat the above 3 ~ 4 process.
> >>
> >> In here, only the difficult one is how to determine the task group at
> >> initial superstep. So, I'd like to add below one to BSPPeer interface.
> >>
> >>   /**
> >>    * @return the names of locally adjacent peers (including this peer).
> >>    */
> >>   public String[] getAdjacentPeerNames();
> >>
> >>
> >> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <ye...@gmail.com>
> wrote:
> >>> great~
> >>>
> >>>
> >>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
> >>>
> >>>>
> >>>>     [
> >>>>
> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430
> ]
> >>>>
> >>>> Edward J. Yoon commented on HAMA-883:
> >>>> -------------------------------------
> >>>>
> >>>> NOTE: my fellow worker is currently working on this issue -
> >>>> https://github.com/garudakang/meerkat
> >>>>
> >>>> > [Research Task] Massive log event aggregation in real time using
> Apache
> >>>> Hama
> >>>> >
> >>>>
> ----------------------------------------------------------------------------
> >>>> >
> >>>> >                 Key: HAMA-883
> >>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
> >>>> >             Project: Hama
> >>>> >          Issue Type: Task
> >>>> >            Reporter: Edward J. Yoon
> >>>> >
> >>>> > BSP tasks can be used for aggregating log data streamed in real
> time.
> >>>> With this research task, we might able to platformization these kind
> of
> >>>> processing.
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> This message was sent by Atlassian JIRA
> >>>> (v6.2#6252)
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> ------
> >>> Yexi Jiang,
> >>> ECS 251,  yjian004@cs.fiu.edu
> >>> School of Computer and Information Science,
> >>> Florida International University
> >>> Homepage: http://users.cis.fiu.edu/~yjian004/
> >>
> >>
> >>
> >> --
> >> Edward J. Yoon (@eddieyoon)
> >> Chief Executive Officer
> >> DataSayer Co., Ltd.
>



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer Co., Ltd.

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Chia-Hung Lin <cl...@googlemail.com>.
Or we can have POC first and then see how it relates to the issue we
might need to fix.

On 11 April 2014 16:10, Chia-Hung Lin <cl...@googlemail.com> wrote:
> In that case are we going to organize multiple tasks into a group? A
> job has N bsp groups (bsp task in current code), in turn each group
> contain multiple tasks (and all tasks are on the same server)?
>
> If this is the case, how do they send messages or communicate between
> groups? group to group? A task (within a group) can arbitrary send the
> messages?
>
> I have this question because this would have implication on FT. IIRC
> Storm is a CEP framework, and messages can be sent arbitrary to every
> bolt. The issue with such computation is that it's not a simple task
> when performing checkpoint. Generally it's done through communication
> induced checkpointing. Otherwise like storm they ack and redo each
> message when necessary; an option is something like batch (in storm
> like trident batch processing if I am correct) transactional
> processing.
>
> What I can think of right now is, with current structure, grouping
> every N messages a superstep, and then asynchronously checkpointing,
> which may be similar to trident batch processing.
>
> I understand it's still far away based on the current status. I
> suppose it's good if we can take that into consideration beforehand as
> well.
>
>
>
>
>
> On 11 April 2014 13:40, Edward J. Yoon <ed...@apache.org> wrote:
>> Yesterday, I had survey the Storm. Storm's task grouping and chainable
>> bolts seems pretty nice (especially, chainable bolts can be really
>> useful in case of real-time join operation).
>>
>> I think, we can also implement similar functions of Storm's task
>> grouping and chainable bolts on BSP. My rough idea is:
>>
>> 1. Launches multi-tasks per node (as number of group of Bolts). For example:
>>
>> +---------------+
>> |    Server1    |
>> +---------------+
>> Task-1. tailing bolt
>> Task-2. split sentence bolt
>> Task-3. wordcount bolt
>>
>> 2. Assign the tasks to proper group.
>> --
>> 3. Each task executes their user-defined function and sends messages
>> to task of next group.
>> 4. Synchronizes all.
>> --
>> 5. Finally, repeat the above 3 ~ 4 process.
>>
>> In here, only the difficult one is how to determine the task group at
>> initial superstep. So, I'd like to add below one to BSPPeer interface.
>>
>>   /**
>>    * @return the names of locally adjacent peers (including this peer).
>>    */
>>   public String[] getAdjacentPeerNames();
>>
>>
>> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <ye...@gmail.com> wrote:
>>> great~
>>>
>>>
>>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>>>
>>>>
>>>>     [
>>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430]
>>>>
>>>> Edward J. Yoon commented on HAMA-883:
>>>> -------------------------------------
>>>>
>>>> NOTE: my fellow worker is currently working on this issue -
>>>> https://github.com/garudakang/meerkat
>>>>
>>>> > [Research Task] Massive log event aggregation in real time using Apache
>>>> Hama
>>>> >
>>>> ----------------------------------------------------------------------------
>>>> >
>>>> >                 Key: HAMA-883
>>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>>>> >             Project: Hama
>>>> >          Issue Type: Task
>>>> >            Reporter: Edward J. Yoon
>>>> >
>>>> > BSP tasks can be used for aggregating log data streamed in real time.
>>>> With this research task, we might able to platformization these kind of
>>>> processing.
>>>>
>>>>
>>>>
>>>> --
>>>> This message was sent by Atlassian JIRA
>>>> (v6.2#6252)
>>>>
>>>
>>>
>>>
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer Co., Ltd.

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Chia-Hung Lin <cl...@googlemail.com>.
In that case are we going to organize multiple tasks into a group? A
job has N bsp groups (bsp task in current code), in turn each group
contain multiple tasks (and all tasks are on the same server)?

If this is the case, how do they send messages or communicate between
groups? group to group? A task (within a group) can arbitrary send the
messages?

I have this question because this would have implication on FT. IIRC
Storm is a CEP framework, and messages can be sent arbitrary to every
bolt. The issue with such computation is that it's not a simple task
when performing checkpoint. Generally it's done through communication
induced checkpointing. Otherwise like storm they ack and redo each
message when necessary; an option is something like batch (in storm
like trident batch processing if I am correct) transactional
processing.

What I can think of right now is, with current structure, grouping
every N messages a superstep, and then asynchronously checkpointing,
which may be similar to trident batch processing.

I understand it's still far away based on the current status. I
suppose it's good if we can take that into consideration beforehand as
well.





On 11 April 2014 13:40, Edward J. Yoon <ed...@apache.org> wrote:
> Yesterday, I had survey the Storm. Storm's task grouping and chainable
> bolts seems pretty nice (especially, chainable bolts can be really
> useful in case of real-time join operation).
>
> I think, we can also implement similar functions of Storm's task
> grouping and chainable bolts on BSP. My rough idea is:
>
> 1. Launches multi-tasks per node (as number of group of Bolts). For example:
>
> +---------------+
> |    Server1    |
> +---------------+
> Task-1. tailing bolt
> Task-2. split sentence bolt
> Task-3. wordcount bolt
>
> 2. Assign the tasks to proper group.
> --
> 3. Each task executes their user-defined function and sends messages
> to task of next group.
> 4. Synchronizes all.
> --
> 5. Finally, repeat the above 3 ~ 4 process.
>
> In here, only the difficult one is how to determine the task group at
> initial superstep. So, I'd like to add below one to BSPPeer interface.
>
>   /**
>    * @return the names of locally adjacent peers (including this peer).
>    */
>   public String[] getAdjacentPeerNames();
>
>
> On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <ye...@gmail.com> wrote:
>> great~
>>
>>
>> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>>
>>>
>>>     [
>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430]
>>>
>>> Edward J. Yoon commented on HAMA-883:
>>> -------------------------------------
>>>
>>> NOTE: my fellow worker is currently working on this issue -
>>> https://github.com/garudakang/meerkat
>>>
>>> > [Research Task] Massive log event aggregation in real time using Apache
>>> Hama
>>> >
>>> ----------------------------------------------------------------------------
>>> >
>>> >                 Key: HAMA-883
>>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>>> >             Project: Hama
>>> >          Issue Type: Task
>>> >            Reporter: Edward J. Yoon
>>> >
>>> > BSP tasks can be used for aggregating log data streamed in real time.
>>> With this research task, we might able to platformization these kind of
>>> processing.
>>>
>>>
>>>
>>> --
>>> This message was sent by Atlassian JIRA
>>> (v6.2#6252)
>>>
>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer Co., Ltd.

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by "Edward J. Yoon" <ed...@apache.org>.
Yesterday, I had survey the Storm. Storm's task grouping and chainable
bolts seems pretty nice (especially, chainable bolts can be really
useful in case of real-time join operation).

I think, we can also implement similar functions of Storm's task
grouping and chainable bolts on BSP. My rough idea is:

1. Launches multi-tasks per node (as number of group of Bolts). For example:

+---------------+
|    Server1    |
+---------------+
Task-1. tailing bolt
Task-2. split sentence bolt
Task-3. wordcount bolt

2. Assign the tasks to proper group.
--
3. Each task executes their user-defined function and sends messages
to task of next group.
4. Synchronizes all.
--
5. Finally, repeat the above 3 ~ 4 process.

In here, only the difficult one is how to determine the task group at
initial superstep. So, I'd like to add below one to BSPPeer interface.

  /**
   * @return the names of locally adjacent peers (including this peer).
   */
  public String[] getAdjacentPeerNames();


On Thu, Apr 3, 2014 at 11:00 AM, Yexi Jiang <ye...@gmail.com> wrote:
> great~
>
>
> 2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>
>>
>>     [
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430]
>>
>> Edward J. Yoon commented on HAMA-883:
>> -------------------------------------
>>
>> NOTE: my fellow worker is currently working on this issue -
>> https://github.com/garudakang/meerkat
>>
>> > [Research Task] Massive log event aggregation in real time using Apache
>> Hama
>> >
>> ----------------------------------------------------------------------------
>> >
>> >                 Key: HAMA-883
>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>> >             Project: Hama
>> >          Issue Type: Task
>> >            Reporter: Edward J. Yoon
>> >
>> > BSP tasks can be used for aggregating log data streamed in real time.
>> With this research task, we might able to platformization these kind of
>> processing.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer Co., Ltd.

Re: [jira] [Commented] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Yexi Jiang <ye...@gmail.com>.
great~


2014-04-02 21:43 GMT-04:00 Edward J. Yoon (JIRA) <ji...@apache.org>:

>
>     [
> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13958430#comment-13958430]
>
> Edward J. Yoon commented on HAMA-883:
> -------------------------------------
>
> NOTE: my fellow worker is currently working on this issue -
> https://github.com/garudakang/meerkat
>
> > [Research Task] Massive log event aggregation in real time using Apache
> Hama
> >
> ----------------------------------------------------------------------------
> >
> >                 Key: HAMA-883
> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
> >             Project: Hama
> >          Issue Type: Task
> >            Reporter: Edward J. Yoon
> >
> > BSP tasks can be used for aggregating log data streamed in real time.
> With this research task, we might able to platformization these kind of
> processing.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/