You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hama.apache.org by "Edward J. Yoon (JIRA)" <ji...@apache.org> on 2014/03/04 07:54:22 UTC

[jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

     [ https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Edward J. Yoon updated HAMA-883:
--------------------------------

    Summary: [Research Task] Massive log event aggregation in real time using Apache Hama  (was: [Research Task] Massive log data aggregation in real time using Apache Hama)

> [Research Task] Massive log event aggregation in real time using Apache Hama
> ----------------------------------------------------------------------------
>
>                 Key: HAMA-883
>                 URL: https://issues.apache.org/jira/browse/HAMA-883
>             Project: Hama
>          Issue Type: Task
>            Reporter: Edward J. Yoon
>
> BSP tasks can be used for aggregating log data streamed in real time. With this research task, we might able to platformization these kind of processing.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Chia-Hung Lin <cl...@googlemail.com>.
BSP is a bridge model that doesn't restrict itself to some particular
usage. My understanding (I could be wrong) is that our framework needs
to address such issue. [1], for example, proposes a solution based on
bsp in the field of real-time application.

[1]. Hartley J.K., Bargiela A., TPML: Parallel meta-language for
scientific and engineering computations using transputers (TPML),
Proc. of 2nd Int. Conf. on Software for Supercomputers and
Multiprocessors, SMS'94, 1994, pp. 22-31




On 4 March 2014 21:20, Yexi Jiang <ye...@gmail.com> wrote:
> I am very interested in this topic since my research area includes event
> mining, but can BSP conducts the real time computing?
>
> I once used the message queue based solution to collect the event logs.
>
>
> 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>
>>
>>      [
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>
>> Edward J. Yoon updated HAMA-883:
>> --------------------------------
>>
>>     Summary: [Research Task] Massive log event aggregation in real time
>> using Apache Hama  (was: [Research Task] Massive log data aggregation in
>> real time using Apache Hama)
>>
>> > [Research Task] Massive log event aggregation in real time using Apache
>> Hama
>> >
>> ----------------------------------------------------------------------------
>> >
>> >                 Key: HAMA-883
>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>> >             Project: Hama
>> >          Issue Type: Task
>> >            Reporter: Edward J. Yoon
>> >
>> > BSP tasks can be used for aggregating log data streamed in real time.
>> With this research task, we might able to platformization these kind of
>> processing.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Chia-Hung Lin <cl...@googlemail.com>.
Below is just my personal viewpoint. We can refactor bsp to be more
modularized so that people can choose if that fits their requirement.
Basically bsp is a generalized model, it may be good if we can create
a flexible framework.



On 5 March 2014 12:25, Edward J. Yoon <ed...@apache.org> wrote:
> Why not?
>
> Sent from my iPhone
>
>> On 2014. 3. 5., at 오후 1:09, Yexi Jiang <ye...@gmail.com> wrote:
>>
>> Yes, currently Hama does not support streaming input and streaming output.
>> That's why currently it is not a natural choice for people with real time
>> computing needs.
>>
>> Do we really need to make Hama to support the real time computing? In that
>> case, we need to compete with Storm...
>>
>>
>> 2014-03-04 22:58 GMT-05:00 Chia-Hung Lin <cl...@googlemail.com>:
>>
>>> I used Twitter Storm previously. Storm is an excellent framework in
>>> real time processing.
>>>
>>> Considering Hama in real time tasks, the framework in my opinion need
>>> to decouple io from hdfs so that the source/ input is not restricted
>>> to just hdfs.
>>>
>>>> On 5 March 2014 09:30, Yexi Jiang <ye...@gmail.com> wrote:
>>>> Please correct me if I'm wrong. My understanding of aggregating the log
>>> is
>>>> the collect the generated from each monitored machine in real time. The
>>>> collecting procedure is continuous like a data stream and never end.
>>>>
>>>> I know how to use Hama to aggregate the logs batch by batch (e.g.
>>> aggregate
>>>> the logs incrementally each day), but I cannot immediately make up an
>>> idea
>>>> of using Hama to solve this problem in real time approach.
>>>>
>>>>
>>>> 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
>>>>
>>>>> Aggregators of Graph package are doing similar wok. Monitoring and
>>>>> Global communication, ..., etc.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com>
>>> wrote:
>>>>>> I am very interested in this topic since my research area includes
>>> event
>>>>>> mining, but can BSP conducts the real time computing?
>>>>>>
>>>>>> I once used the message queue based solution to collect the event
>>> logs.
>>>>>>
>>>>>>
>>>>>> 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>>>>>>
>>>>>>>
>>>>>>>     [
>>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>> ]
>>>>>>>
>>>>>>> Edward J. Yoon updated HAMA-883:
>>>>>>> --------------------------------
>>>>>>>
>>>>>>>    Summary: [Research Task] Massive log event aggregation in real
>>> time
>>>>>>> using Apache Hama  (was: [Research Task] Massive log data
>>> aggregation in
>>>>>>> real time using Apache Hama)
>>>>>>>
>>>>>>>> [Research Task] Massive log event aggregation in real time using
>>>>> Apache
>>>>>>> Hama
>>> ----------------------------------------------------------------------------
>>>>>>>>
>>>>>>>>                Key: HAMA-883
>>>>>>>>                URL:
>>> https://issues.apache.org/jira/browse/HAMA-883
>>>>>>>>            Project: Hama
>>>>>>>>         Issue Type: Task
>>>>>>>>           Reporter: Edward J. Yoon
>>>>>>>>
>>>>>>>> BSP tasks can be used for aggregating log data streamed in real
>>> time.
>>>>>>> With this research task, we might able to platformization these kind
>>> of
>>>>>>> processing.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> This message was sent by Atlassian JIRA
>>>>>>> (v6.2#6252)
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ------
>>>>>> Yexi Jiang,
>>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>>> School of Computer and Information Science,
>>>>>> Florida International University
>>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Edward J. Yoon (@eddieyoon)
>>>>> Chief Executive Officer
>>>>> DataSayer, Inc.
>>>>
>>>>
>>>>
>>>> --
>>>> ------
>>>> Yexi Jiang,
>>>> ECS 251,  yjian004@cs.fiu.edu
>>>> School of Computer and Information Science,
>>>> Florida International University
>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>>
>> --
>> ------
>> Yexi Jiang,
>> ECS 251,  yjian004@cs.fiu.edu
>> School of Computer and Information Science,
>> Florida International University
>> Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by "Edward J. Yoon" <ed...@apache.org>.
Why not?

Sent from my iPhone

> On 2014. 3. 5., at 오후 1:09, Yexi Jiang <ye...@gmail.com> wrote:
> 
> Yes, currently Hama does not support streaming input and streaming output.
> That's why currently it is not a natural choice for people with real time
> computing needs.
> 
> Do we really need to make Hama to support the real time computing? In that
> case, we need to compete with Storm...
> 
> 
> 2014-03-04 22:58 GMT-05:00 Chia-Hung Lin <cl...@googlemail.com>:
> 
>> I used Twitter Storm previously. Storm is an excellent framework in
>> real time processing.
>> 
>> Considering Hama in real time tasks, the framework in my opinion need
>> to decouple io from hdfs so that the source/ input is not restricted
>> to just hdfs.
>> 
>>> On 5 March 2014 09:30, Yexi Jiang <ye...@gmail.com> wrote:
>>> Please correct me if I'm wrong. My understanding of aggregating the log
>> is
>>> the collect the generated from each monitored machine in real time. The
>>> collecting procedure is continuous like a data stream and never end.
>>> 
>>> I know how to use Hama to aggregate the logs batch by batch (e.g.
>> aggregate
>>> the logs incrementally each day), but I cannot immediately make up an
>> idea
>>> of using Hama to solve this problem in real time approach.
>>> 
>>> 
>>> 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
>>> 
>>>> Aggregators of Graph package are doing similar wok. Monitoring and
>>>> Global communication, ..., etc.
>>>> 
>>>> 
>>>> 
>>>> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com>
>> wrote:
>>>>> I am very interested in this topic since my research area includes
>> event
>>>>> mining, but can BSP conducts the real time computing?
>>>>> 
>>>>> I once used the message queue based solution to collect the event
>> logs.
>>>>> 
>>>>> 
>>>>> 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>>>>> 
>>>>>> 
>>>>>>     [
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>> ]
>>>>>> 
>>>>>> Edward J. Yoon updated HAMA-883:
>>>>>> --------------------------------
>>>>>> 
>>>>>>    Summary: [Research Task] Massive log event aggregation in real
>> time
>>>>>> using Apache Hama  (was: [Research Task] Massive log data
>> aggregation in
>>>>>> real time using Apache Hama)
>>>>>> 
>>>>>>> [Research Task] Massive log event aggregation in real time using
>>>> Apache
>>>>>> Hama
>> ----------------------------------------------------------------------------
>>>>>>> 
>>>>>>>                Key: HAMA-883
>>>>>>>                URL:
>> https://issues.apache.org/jira/browse/HAMA-883
>>>>>>>            Project: Hama
>>>>>>>         Issue Type: Task
>>>>>>>           Reporter: Edward J. Yoon
>>>>>>> 
>>>>>>> BSP tasks can be used for aggregating log data streamed in real
>> time.
>>>>>> With this research task, we might able to platformization these kind
>> of
>>>>>> processing.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> This message was sent by Atlassian JIRA
>>>>>> (v6.2#6252)
>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> ------
>>>>> Yexi Jiang,
>>>>> ECS 251,  yjian004@cs.fiu.edu
>>>>> School of Computer and Information Science,
>>>>> Florida International University
>>>>> Homepage: http://users.cis.fiu.edu/~yjian004/
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Edward J. Yoon (@eddieyoon)
>>>> Chief Executive Officer
>>>> DataSayer, Inc.
>>> 
>>> 
>>> 
>>> --
>>> ------
>>> Yexi Jiang,
>>> ECS 251,  yjian004@cs.fiu.edu
>>> School of Computer and Information Science,
>>> Florida International University
>>> Homepage: http://users.cis.fiu.edu/~yjian004/
> 
> 
> 
> -- 
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Yexi Jiang <ye...@gmail.com>.
Yes, currently Hama does not support streaming input and streaming output.
 That's why currently it is not a natural choice for people with real time
computing needs.

Do we really need to make Hama to support the real time computing? In that
case, we need to compete with Storm...


2014-03-04 22:58 GMT-05:00 Chia-Hung Lin <cl...@googlemail.com>:

> I used Twitter Storm previously. Storm is an excellent framework in
> real time processing.
>
> Considering Hama in real time tasks, the framework in my opinion need
> to decouple io from hdfs so that the source/ input is not restricted
> to just hdfs.
>
> On 5 March 2014 09:30, Yexi Jiang <ye...@gmail.com> wrote:
> > Please correct me if I'm wrong. My understanding of aggregating the log
> is
> > the collect the generated from each monitored machine in real time. The
> > collecting procedure is continuous like a data stream and never end.
> >
> > I know how to use Hama to aggregate the logs batch by batch (e.g.
> aggregate
> > the logs incrementally each day), but I cannot immediately make up an
> idea
> > of using Hama to solve this problem in real time approach.
> >
> >
> > 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
> >
> >> Aggregators of Graph package are doing similar wok. Monitoring and
> >> Global communication, ..., etc.
> >>
> >>
> >>
> >> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com>
> wrote:
> >> > I am very interested in this topic since my research area includes
> event
> >> > mining, but can BSP conducts the real time computing?
> >> >
> >> > I once used the message queue based solution to collect the event
> logs.
> >> >
> >> >
> >> > 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
> >> >
> >> >>
> >> >>      [
> >> >>
> >>
> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >> ]
> >> >>
> >> >> Edward J. Yoon updated HAMA-883:
> >> >> --------------------------------
> >> >>
> >> >>     Summary: [Research Task] Massive log event aggregation in real
> time
> >> >> using Apache Hama  (was: [Research Task] Massive log data
> aggregation in
> >> >> real time using Apache Hama)
> >> >>
> >> >> > [Research Task] Massive log event aggregation in real time using
> >> Apache
> >> >> Hama
> >> >> >
> >> >>
> >>
> ----------------------------------------------------------------------------
> >> >> >
> >> >> >                 Key: HAMA-883
> >> >> >                 URL:
> https://issues.apache.org/jira/browse/HAMA-883
> >> >> >             Project: Hama
> >> >> >          Issue Type: Task
> >> >> >            Reporter: Edward J. Yoon
> >> >> >
> >> >> > BSP tasks can be used for aggregating log data streamed in real
> time.
> >> >> With this research task, we might able to platformization these kind
> of
> >> >> processing.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> This message was sent by Atlassian JIRA
> >> >> (v6.2#6252)
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > ------
> >> > Yexi Jiang,
> >> > ECS 251,  yjian004@cs.fiu.edu
> >> > School of Computer and Information Science,
> >> > Florida International University
> >> > Homepage: http://users.cis.fiu.edu/~yjian004/
> >>
> >>
> >>
> >> --
> >> Edward J. Yoon (@eddieyoon)
> >> Chief Executive Officer
> >> DataSayer, Inc.
> >>
> >
> >
> >
> > --
> > ------
> > Yexi Jiang,
> > ECS 251,  yjian004@cs.fiu.edu
> > School of Computer and Information Science,
> > Florida International University
> > Homepage: http://users.cis.fiu.edu/~yjian004/
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Chia-Hung Lin <cl...@googlemail.com>.
I used Twitter Storm previously. Storm is an excellent framework in
real time processing.

Considering Hama in real time tasks, the framework in my opinion need
to decouple io from hdfs so that the source/ input is not restricted
to just hdfs.

On 5 March 2014 09:30, Yexi Jiang <ye...@gmail.com> wrote:
> Please correct me if I'm wrong. My understanding of aggregating the log is
> the collect the generated from each monitored machine in real time. The
> collecting procedure is continuous like a data stream and never end.
>
> I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate
> the logs incrementally each day), but I cannot immediately make up an idea
> of using Hama to solve this problem in real time approach.
>
>
> 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
>
>> Aggregators of Graph package are doing similar wok. Monitoring and
>> Global communication, ..., etc.
>>
>>
>>
>> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com> wrote:
>> > I am very interested in this topic since my research area includes event
>> > mining, but can BSP conducts the real time computing?
>> >
>> > I once used the message queue based solution to collect the event logs.
>> >
>> >
>> > 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>> >
>> >>
>> >>      [
>> >>
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> ]
>> >>
>> >> Edward J. Yoon updated HAMA-883:
>> >> --------------------------------
>> >>
>> >>     Summary: [Research Task] Massive log event aggregation in real time
>> >> using Apache Hama  (was: [Research Task] Massive log data aggregation in
>> >> real time using Apache Hama)
>> >>
>> >> > [Research Task] Massive log event aggregation in real time using
>> Apache
>> >> Hama
>> >> >
>> >>
>> ----------------------------------------------------------------------------
>> >> >
>> >> >                 Key: HAMA-883
>> >> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>> >> >             Project: Hama
>> >> >          Issue Type: Task
>> >> >            Reporter: Edward J. Yoon
>> >> >
>> >> > BSP tasks can be used for aggregating log data streamed in real time.
>> >> With this research task, we might able to platformization these kind of
>> >> processing.
>> >>
>> >>
>> >>
>> >> --
>> >> This message was sent by Atlassian JIRA
>> >> (v6.2#6252)
>> >>
>> >
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by "Edward J. Yoon" <ed...@apache.org>.
I'm thinking about coupling with ML (incremental) algorithms.

On Wed, Mar 5, 2014 at 11:16 AM, Yexi Jiang <ye...@gmail.com> wrote:
> I have ever implemented a system monitor/log collector using ActiveMQ and a
> real time anomaly detection algorithm on top of Twitter's Storm. I think
> people like me may naturally choose such streaming computing framework to
> handle this scenario.
>
> For real time computation, what is the unique characteristics of Hama that
> make people choose it instead of Storm? In my humble opinion, one unique
> characteristic of Hama is that it provides a general BSP computing
> framework (compared with Giraph, who provide a specific BSP only for graph
> computing). No one else has such ability.
>
>
> 2014-03-04 21:02 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
>
>> The final goal can be a real-time event processing framework for
>> distributed event detection, filtering, and aggregation. I guess that
>> can be done with only 3 components:
>>
>>  * Event processing job configuration interface.
>>  * User-defined function that handles the stream input.
>>  * Master Aggregator(s) and its client library.
>>
>> I expect this can be applied such as web clickstream log analysis
>> (large scale web servers), finding hot search keywords, detecting
>> system errors in real time, and user will be able to program them in
>> few minutes.
>>
>>
>> On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang <ye...@gmail.com> wrote:
>> > Please correct me if I'm wrong. My understanding of aggregating the log
>> is
>> > the collect the generated from each monitored machine in real time. The
>> > collecting procedure is continuous like a data stream and never end.
>> >
>> > I know how to use Hama to aggregate the logs batch by batch (e.g.
>> aggregate
>> > the logs incrementally each day), but I cannot immediately make up an
>> idea
>> > of using Hama to solve this problem in real time approach.
>> >
>> >
>> > 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
>> >
>> >> Aggregators of Graph package are doing similar wok. Monitoring and
>> >> Global communication, ..., etc.
>> >>
>> >>
>> >>
>> >> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com>
>> wrote:
>> >> > I am very interested in this topic since my research area includes
>> event
>> >> > mining, but can BSP conducts the real time computing?
>> >> >
>> >> > I once used the message queue based solution to collect the event
>> logs.
>> >> >
>> >> >
>> >> > 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>> >> >
>> >> >>
>> >> >>      [
>> >> >>
>> >>
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> >> ]
>> >> >>
>> >> >> Edward J. Yoon updated HAMA-883:
>> >> >> --------------------------------
>> >> >>
>> >> >>     Summary: [Research Task] Massive log event aggregation in real
>> time
>> >> >> using Apache Hama  (was: [Research Task] Massive log data
>> aggregation in
>> >> >> real time using Apache Hama)
>> >> >>
>> >> >> > [Research Task] Massive log event aggregation in real time using
>> >> Apache
>> >> >> Hama
>> >> >> >
>> >> >>
>> >>
>> ----------------------------------------------------------------------------
>> >> >> >
>> >> >> >                 Key: HAMA-883
>> >> >> >                 URL:
>> https://issues.apache.org/jira/browse/HAMA-883
>> >> >> >             Project: Hama
>> >> >> >          Issue Type: Task
>> >> >> >            Reporter: Edward J. Yoon
>> >> >> >
>> >> >> > BSP tasks can be used for aggregating log data streamed in real
>> time.
>> >> >> With this research task, we might able to platformization these kind
>> of
>> >> >> processing.
>> >> >>
>> >> >>
>> >> >>
>> >> >> --
>> >> >> This message was sent by Atlassian JIRA
>> >> >> (v6.2#6252)
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > ------
>> >> > Yexi Jiang,
>> >> > ECS 251,  yjian004@cs.fiu.edu
>> >> > School of Computer and Information Science,
>> >> > Florida International University
>> >> > Homepage: http://users.cis.fiu.edu/~yjian004/
>> >>
>> >>
>> >>
>> >> --
>> >> Edward J. Yoon (@eddieyoon)
>> >> Chief Executive Officer
>> >> DataSayer, Inc.
>> >>
>> >
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Yexi Jiang <ye...@gmail.com>.
I have ever implemented a system monitor/log collector using ActiveMQ and a
real time anomaly detection algorithm on top of Twitter's Storm. I think
people like me may naturally choose such streaming computing framework to
handle this scenario.

For real time computation, what is the unique characteristics of Hama that
make people choose it instead of Storm? In my humble opinion, one unique
characteristic of Hama is that it provides a general BSP computing
framework (compared with Giraph, who provide a specific BSP only for graph
computing). No one else has such ability.


2014-03-04 21:02 GMT-05:00 Edward J. Yoon <ed...@apache.org>:

> The final goal can be a real-time event processing framework for
> distributed event detection, filtering, and aggregation. I guess that
> can be done with only 3 components:
>
>  * Event processing job configuration interface.
>  * User-defined function that handles the stream input.
>  * Master Aggregator(s) and its client library.
>
> I expect this can be applied such as web clickstream log analysis
> (large scale web servers), finding hot search keywords, detecting
> system errors in real time, and user will be able to program them in
> few minutes.
>
>
> On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang <ye...@gmail.com> wrote:
> > Please correct me if I'm wrong. My understanding of aggregating the log
> is
> > the collect the generated from each monitored machine in real time. The
> > collecting procedure is continuous like a data stream and never end.
> >
> > I know how to use Hama to aggregate the logs batch by batch (e.g.
> aggregate
> > the logs incrementally each day), but I cannot immediately make up an
> idea
> > of using Hama to solve this problem in real time approach.
> >
> >
> > 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
> >
> >> Aggregators of Graph package are doing similar wok. Monitoring and
> >> Global communication, ..., etc.
> >>
> >>
> >>
> >> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com>
> wrote:
> >> > I am very interested in this topic since my research area includes
> event
> >> > mining, but can BSP conducts the real time computing?
> >> >
> >> > I once used the message queue based solution to collect the event
> logs.
> >> >
> >> >
> >> > 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
> >> >
> >> >>
> >> >>      [
> >> >>
> >>
> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> >> ]
> >> >>
> >> >> Edward J. Yoon updated HAMA-883:
> >> >> --------------------------------
> >> >>
> >> >>     Summary: [Research Task] Massive log event aggregation in real
> time
> >> >> using Apache Hama  (was: [Research Task] Massive log data
> aggregation in
> >> >> real time using Apache Hama)
> >> >>
> >> >> > [Research Task] Massive log event aggregation in real time using
> >> Apache
> >> >> Hama
> >> >> >
> >> >>
> >>
> ----------------------------------------------------------------------------
> >> >> >
> >> >> >                 Key: HAMA-883
> >> >> >                 URL:
> https://issues.apache.org/jira/browse/HAMA-883
> >> >> >             Project: Hama
> >> >> >          Issue Type: Task
> >> >> >            Reporter: Edward J. Yoon
> >> >> >
> >> >> > BSP tasks can be used for aggregating log data streamed in real
> time.
> >> >> With this research task, we might able to platformization these kind
> of
> >> >> processing.
> >> >>
> >> >>
> >> >>
> >> >> --
> >> >> This message was sent by Atlassian JIRA
> >> >> (v6.2#6252)
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > ------
> >> > Yexi Jiang,
> >> > ECS 251,  yjian004@cs.fiu.edu
> >> > School of Computer and Information Science,
> >> > Florida International University
> >> > Homepage: http://users.cis.fiu.edu/~yjian004/
> >>
> >>
> >>
> >> --
> >> Edward J. Yoon (@eddieyoon)
> >> Chief Executive Officer
> >> DataSayer, Inc.
> >>
> >
> >
> >
> > --
> > ------
> > Yexi Jiang,
> > ECS 251,  yjian004@cs.fiu.edu
> > School of Computer and Information Science,
> > Florida International University
> > Homepage: http://users.cis.fiu.edu/~yjian004/
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer, Inc.
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by "Edward J. Yoon" <ed...@apache.org>.
The final goal can be a real-time event processing framework for
distributed event detection, filtering, and aggregation. I guess that
can be done with only 3 components:

 * Event processing job configuration interface.
 * User-defined function that handles the stream input.
 * Master Aggregator(s) and its client library.

I expect this can be applied such as web clickstream log analysis
(large scale web servers), finding hot search keywords, detecting
system errors in real time, and user will be able to program them in
few minutes.


On Wed, Mar 5, 2014 at 10:30 AM, Yexi Jiang <ye...@gmail.com> wrote:
> Please correct me if I'm wrong. My understanding of aggregating the log is
> the collect the generated from each monitored machine in real time. The
> collecting procedure is continuous like a data stream and never end.
>
> I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate
> the logs incrementally each day), but I cannot immediately make up an idea
> of using Hama to solve this problem in real time approach.
>
>
> 2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
>
>> Aggregators of Graph package are doing similar wok. Monitoring and
>> Global communication, ..., etc.
>>
>>
>>
>> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com> wrote:
>> > I am very interested in this topic since my research area includes event
>> > mining, but can BSP conducts the real time computing?
>> >
>> > I once used the message queue based solution to collect the event logs.
>> >
>> >
>> > 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>> >
>> >>
>> >>      [
>> >>
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>> ]
>> >>
>> >> Edward J. Yoon updated HAMA-883:
>> >> --------------------------------
>> >>
>> >>     Summary: [Research Task] Massive log event aggregation in real time
>> >> using Apache Hama  (was: [Research Task] Massive log data aggregation in
>> >> real time using Apache Hama)
>> >>
>> >> > [Research Task] Massive log event aggregation in real time using
>> Apache
>> >> Hama
>> >> >
>> >>
>> ----------------------------------------------------------------------------
>> >> >
>> >> >                 Key: HAMA-883
>> >> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>> >> >             Project: Hama
>> >> >          Issue Type: Task
>> >> >            Reporter: Edward J. Yoon
>> >> >
>> >> > BSP tasks can be used for aggregating log data streamed in real time.
>> >> With this research task, we might able to platformization these kind of
>> >> processing.
>> >>
>> >>
>> >>
>> >> --
>> >> This message was sent by Atlassian JIRA
>> >> (v6.2#6252)
>> >>
>> >
>> >
>> >
>> > --
>> > ------
>> > Yexi Jiang,
>> > ECS 251,  yjian004@cs.fiu.edu
>> > School of Computer and Information Science,
>> > Florida International University
>> > Homepage: http://users.cis.fiu.edu/~yjian004/
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Yexi Jiang <ye...@gmail.com>.
Please correct me if I'm wrong. My understanding of aggregating the log is
the collect the generated from each monitored machine in real time. The
collecting procedure is continuous like a data stream and never end.

I know how to use Hama to aggregate the logs batch by batch (e.g. aggregate
the logs incrementally each day), but I cannot immediately make up an idea
of using Hama to solve this problem in real time approach.


2014-03-04 19:32 GMT-05:00 Edward J. Yoon <ed...@apache.org>:

> Aggregators of Graph package are doing similar wok. Monitoring and
> Global communication, ..., etc.
>
>
>
> On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com> wrote:
> > I am very interested in this topic since my research area includes event
> > mining, but can BSP conducts the real time computing?
> >
> > I once used the message queue based solution to collect the event logs.
> >
> >
> > 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
> >
> >>
> >>      [
> >>
> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> >>
> >> Edward J. Yoon updated HAMA-883:
> >> --------------------------------
> >>
> >>     Summary: [Research Task] Massive log event aggregation in real time
> >> using Apache Hama  (was: [Research Task] Massive log data aggregation in
> >> real time using Apache Hama)
> >>
> >> > [Research Task] Massive log event aggregation in real time using
> Apache
> >> Hama
> >> >
> >>
> ----------------------------------------------------------------------------
> >> >
> >> >                 Key: HAMA-883
> >> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
> >> >             Project: Hama
> >> >          Issue Type: Task
> >> >            Reporter: Edward J. Yoon
> >> >
> >> > BSP tasks can be used for aggregating log data streamed in real time.
> >> With this research task, we might able to platformization these kind of
> >> processing.
> >>
> >>
> >>
> >> --
> >> This message was sent by Atlassian JIRA
> >> (v6.2#6252)
> >>
> >
> >
> >
> > --
> > ------
> > Yexi Jiang,
> > ECS 251,  yjian004@cs.fiu.edu
> > School of Computer and Information Science,
> > Florida International University
> > Homepage: http://users.cis.fiu.edu/~yjian004/
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer, Inc.
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by "Edward J. Yoon" <ed...@apache.org>.
Aggregators of Graph package are doing similar wok. Monitoring and
Global communication, ..., etc.



On Tue, Mar 4, 2014 at 10:20 PM, Yexi Jiang <ye...@gmail.com> wrote:
> I am very interested in this topic since my research area includes event
> mining, but can BSP conducts the real time computing?
>
> I once used the message queue based solution to collect the event logs.
>
>
> 2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:
>
>>
>>      [
>> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>>
>> Edward J. Yoon updated HAMA-883:
>> --------------------------------
>>
>>     Summary: [Research Task] Massive log event aggregation in real time
>> using Apache Hama  (was: [Research Task] Massive log data aggregation in
>> real time using Apache Hama)
>>
>> > [Research Task] Massive log event aggregation in real time using Apache
>> Hama
>> >
>> ----------------------------------------------------------------------------
>> >
>> >                 Key: HAMA-883
>> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
>> >             Project: Hama
>> >          Issue Type: Task
>> >            Reporter: Edward J. Yoon
>> >
>> > BSP tasks can be used for aggregating log data streamed in real time.
>> With this research task, we might able to platformization these kind of
>> processing.
>>
>>
>>
>> --
>> This message was sent by Atlassian JIRA
>> (v6.2#6252)
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Re: [jira] [Updated] (HAMA-883) [Research Task] Massive log event aggregation in real time using Apache Hama

Posted by Yexi Jiang <ye...@gmail.com>.
I am very interested in this topic since my research area includes event
mining, but can BSP conducts the real time computing?

I once used the message queue based solution to collect the event logs.


2014-03-04 1:54 GMT-05:00 Edward J. Yoon (JIRA) <ji...@apache.org>:

>
>      [
> https://issues.apache.org/jira/browse/HAMA-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel]
>
> Edward J. Yoon updated HAMA-883:
> --------------------------------
>
>     Summary: [Research Task] Massive log event aggregation in real time
> using Apache Hama  (was: [Research Task] Massive log data aggregation in
> real time using Apache Hama)
>
> > [Research Task] Massive log event aggregation in real time using Apache
> Hama
> >
> ----------------------------------------------------------------------------
> >
> >                 Key: HAMA-883
> >                 URL: https://issues.apache.org/jira/browse/HAMA-883
> >             Project: Hama
> >          Issue Type: Task
> >            Reporter: Edward J. Yoon
> >
> > BSP tasks can be used for aggregating log data streamed in real time.
> With this research task, we might able to platformization these kind of
> processing.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.2#6252)
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/