You are viewing a plain text version of this content. The canonical link for it is here.
Posted to general@incubator.apache.org by "Edward J. Yoon" <ed...@apache.org> on 2014/04/28 04:23:31 UTC

[DISCUSS] Meerkat as a Hama sub-module or sub-project

Hi guys,

As some people already might know, I'm recently working on real-time
data processing project on top of Hama BSP model, called Meerkat[1]
(currently only few developers (from two organizations, Kakaotalk and
DataSayer) are involved in this project).

According to our internal study, Storm-like DAG-style and
fault-tolerant streaming processing framework can be implemented on
top of Hama BSP model. And, furthermore, we're thinking, it may be
possible to connect to other BSP applications using some smart input
and output formats in the future e.g., streaming graph or learning BSP
applications. Thus, we believe that we may be able to have an
next-generation architecture that processes and analyzes the data
rapidly in real-time, beyond complex old-style data collecting,
storing, ordering, processing, and analyzing architecture.

Does it make sense to you? If so, I'd like to start to move from
github to ASF soon, although this is very *early* stage. Because, it's
highly related with Apache Hama and others. Hama sub-module or
sub-project, (or Apache incubator?). Which is best you think?

I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
data OSS communities).

1. https://github.com/datasayer/meerkat

-- 
Best Regards, Edward J. Yoon

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by InJun Song <ij...@gmail.com>.
I am also agree with suggestions of Ted and Tommas.
Since Meerkat will be developed on top of Hama, it can be managed under
Hama project like hama-graph. In addition, developing Meerkat will not make
Hama changed largely (even not at all), because Hama is generic BSP
framework.



On Mon, Apr 28, 2014 at 6:02 PM, Tommaso Teofili
<to...@gmail.com>wrote:

>
>
>
> 2014-04-28 10:45 GMT+02:00 Ted Dunning <te...@gmail.com>:
>
> Edward,
>>
>> Sub-projects are generally frowned on.  Incubator projects can graduate
>> eventually to a top-level project, but starting a project as a sub is not
>> so good.  The incubator docs on this are pretty good reading.
>>
>> But if the committer community for your real-time BSP is (or can be) the
>> same as the committer community for Hama itself, you might want to just
>> fold this new code directly into Hama itself.  No need in that case for a
>> separate project.
>>
>> If the committer community is very different, then a separate project is
>> warranted.
>>
>> I don't think that the Hama development is such a high bandwidth thing
>> that
>> splitting is required.  To my mind that says that joining groups together
>> is better than breaking them apart.  Together, the two efforts can feed
>> off
>> each other.  Apart, they could each run down due to lack of interest.
>>
>
> +1 to Ted's suggestion, we can create a separate directory on trunk for
> Meerkat (or a branch) and eventually have separate release processes for
> Hama core stuff and Meerkat stuff if that's needed.
>
>
>>
>> You will know much better than any of us the details of your communities.
>>
>
> as far as I can see Meerkat is being developed by Edward and another
> committer so it should be relatively straightforward to let him/her join
> the Hama community.
>
> My 2 cents,
> Tommaso
>
>
>>
>>
>>
>>
>>
>> On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>>
>> > Hi guys,
>> >
>> > As some people already might know, I'm recently working on real-time
>> > data processing project on top of Hama BSP model, called Meerkat[1]
>> > (currently only few developers (from two organizations, Kakaotalk and
>> > DataSayer) are involved in this project).
>> >
>> > According to our internal study, Storm-like DAG-style and
>> > fault-tolerant streaming processing framework can be implemented on
>> > top of Hama BSP model. And, furthermore, we're thinking, it may be
>> > possible to connect to other BSP applications using some smart input
>> > and output formats in the future e.g., streaming graph or learning BSP
>> > applications. Thus, we believe that we may be able to have an
>> > next-generation architecture that processes and analyzes the data
>> > rapidly in real-time, beyond complex old-style data collecting,
>> > storing, ordering, processing, and analyzing architecture.
>> >
>> > Does it make sense to you? If so, I'd like to start to move from
>> > github to ASF soon, although this is very *early* stage. Because, it's
>> > highly related with Apache Hama and others. Hama sub-module or
>> > sub-project, (or Apache incubator?). Which is best you think?
>> >
>> > I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
>> > data OSS communities).
>> >
>> > 1. https://github.com/datasayer/meerkat
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> >
>> >
>>
>
>

Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by InJun Song <ij...@gmail.com>.
I am also agree with suggestions of Ted and Tommas.
Since Meerkat will be developed on top of Hama, it can be managed under
Hama project like hama-graph. In addition, developing Meerkat will not make
Hama changed largely (even not at all), because Hama is generic BSP
framework.



On Mon, Apr 28, 2014 at 6:02 PM, Tommaso Teofili
<to...@gmail.com>wrote:

>
>
>
> 2014-04-28 10:45 GMT+02:00 Ted Dunning <te...@gmail.com>:
>
> Edward,
>>
>> Sub-projects are generally frowned on.  Incubator projects can graduate
>> eventually to a top-level project, but starting a project as a sub is not
>> so good.  The incubator docs on this are pretty good reading.
>>
>> But if the committer community for your real-time BSP is (or can be) the
>> same as the committer community for Hama itself, you might want to just
>> fold this new code directly into Hama itself.  No need in that case for a
>> separate project.
>>
>> If the committer community is very different, then a separate project is
>> warranted.
>>
>> I don't think that the Hama development is such a high bandwidth thing
>> that
>> splitting is required.  To my mind that says that joining groups together
>> is better than breaking them apart.  Together, the two efforts can feed
>> off
>> each other.  Apart, they could each run down due to lack of interest.
>>
>
> +1 to Ted's suggestion, we can create a separate directory on trunk for
> Meerkat (or a branch) and eventually have separate release processes for
> Hama core stuff and Meerkat stuff if that's needed.
>
>
>>
>> You will know much better than any of us the details of your communities.
>>
>
> as far as I can see Meerkat is being developed by Edward and another
> committer so it should be relatively straightforward to let him/her join
> the Hama community.
>
> My 2 cents,
> Tommaso
>
>
>>
>>
>>
>>
>>
>> On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <edwardyoon@apache.org
>> >wrote:
>>
>> > Hi guys,
>> >
>> > As some people already might know, I'm recently working on real-time
>> > data processing project on top of Hama BSP model, called Meerkat[1]
>> > (currently only few developers (from two organizations, Kakaotalk and
>> > DataSayer) are involved in this project).
>> >
>> > According to our internal study, Storm-like DAG-style and
>> > fault-tolerant streaming processing framework can be implemented on
>> > top of Hama BSP model. And, furthermore, we're thinking, it may be
>> > possible to connect to other BSP applications using some smart input
>> > and output formats in the future e.g., streaming graph or learning BSP
>> > applications. Thus, we believe that we may be able to have an
>> > next-generation architecture that processes and analyzes the data
>> > rapidly in real-time, beyond complex old-style data collecting,
>> > storing, ordering, processing, and analyzing architecture.
>> >
>> > Does it make sense to you? If so, I'd like to start to move from
>> > github to ASF soon, although this is very *early* stage. Because, it's
>> > highly related with Apache Hama and others. Hama sub-module or
>> > sub-project, (or Apache incubator?). Which is best you think?
>> >
>> > I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
>> > data OSS communities).
>> >
>> > 1. https://github.com/datasayer/meerkat
>> >
>> > --
>> > Best Regards, Edward J. Yoon
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> >
>> >
>>
>
>

Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by Tommaso Teofili <to...@gmail.com>.
2014-04-28 10:45 GMT+02:00 Ted Dunning <te...@gmail.com>:

> Edward,
>
> Sub-projects are generally frowned on.  Incubator projects can graduate
> eventually to a top-level project, but starting a project as a sub is not
> so good.  The incubator docs on this are pretty good reading.
>
> But if the committer community for your real-time BSP is (or can be) the
> same as the committer community for Hama itself, you might want to just
> fold this new code directly into Hama itself.  No need in that case for a
> separate project.
>
> If the committer community is very different, then a separate project is
> warranted.
>
> I don't think that the Hama development is such a high bandwidth thing that
> splitting is required.  To my mind that says that joining groups together
> is better than breaking them apart.  Together, the two efforts can feed off
> each other.  Apart, they could each run down due to lack of interest.
>

+1 to Ted's suggestion, we can create a separate directory on trunk for
Meerkat (or a branch) and eventually have separate release processes for
Hama core stuff and Meerkat stuff if that's needed.


>
> You will know much better than any of us the details of your communities.
>

as far as I can see Meerkat is being developed by Edward and another
committer so it should be relatively straightforward to let him/her join
the Hama community.

My 2 cents,
Tommaso


>
>
>
>
>
> On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Hi guys,
> >
> > As some people already might know, I'm recently working on real-time
> > data processing project on top of Hama BSP model, called Meerkat[1]
> > (currently only few developers (from two organizations, Kakaotalk and
> > DataSayer) are involved in this project).
> >
> > According to our internal study, Storm-like DAG-style and
> > fault-tolerant streaming processing framework can be implemented on
> > top of Hama BSP model. And, furthermore, we're thinking, it may be
> > possible to connect to other BSP applications using some smart input
> > and output formats in the future e.g., streaming graph or learning BSP
> > applications. Thus, we believe that we may be able to have an
> > next-generation architecture that processes and analyzes the data
> > rapidly in real-time, beyond complex old-style data collecting,
> > storing, ordering, processing, and analyzing architecture.
> >
> > Does it make sense to you? If so, I'd like to start to move from
> > github to ASF soon, although this is very *early* stage. Because, it's
> > highly related with Apache Hama and others. Hama sub-module or
> > sub-project, (or Apache incubator?). Which is best you think?
> >
> > I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
> > data OSS communities).
> >
> > 1. https://github.com/datasayer/meerkat
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks Ted. I generally agree with your suggestion.


On Mon, Apr 28, 2014 at 5:45 PM, Ted Dunning <te...@gmail.com> wrote:
> Edward,
>
> Sub-projects are generally frowned on.  Incubator projects can graduate
> eventually to a top-level project, but starting a project as a sub is not
> so good.  The incubator docs on this are pretty good reading.
>
> But if the committer community for your real-time BSP is (or can be) the
> same as the committer community for Hama itself, you might want to just
> fold this new code directly into Hama itself.  No need in that case for a
> separate project.
>
> If the committer community is very different, then a separate project is
> warranted.
>
> I don't think that the Hama development is such a high bandwidth thing that
> splitting is required.  To my mind that says that joining groups together
> is better than breaking them apart.  Together, the two efforts can feed off
> each other.  Apart, they could each run down due to lack of interest.
>
> You will know much better than any of us the details of your communities.
>
>
>
>
>
> On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Hi guys,
>>
>> As some people already might know, I'm recently working on real-time
>> data processing project on top of Hama BSP model, called Meerkat[1]
>> (currently only few developers (from two organizations, Kakaotalk and
>> DataSayer) are involved in this project).
>>
>> According to our internal study, Storm-like DAG-style and
>> fault-tolerant streaming processing framework can be implemented on
>> top of Hama BSP model. And, furthermore, we're thinking, it may be
>> possible to connect to other BSP applications using some smart input
>> and output formats in the future e.g., streaming graph or learning BSP
>> applications. Thus, we believe that we may be able to have an
>> next-generation architecture that processes and analyzes the data
>> rapidly in real-time, beyond complex old-style data collecting,
>> storing, ordering, processing, and analyzing architecture.
>>
>> Does it make sense to you? If so, I'd like to start to move from
>> github to ASF soon, although this is very *early* stage. Because, it's
>> highly related with Apache Hama and others. Hama sub-module or
>> sub-project, (or Apache incubator?). Which is best you think?
>>
>> I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
>> data OSS communities).
>>
>> 1. https://github.com/datasayer/meerkat
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>



-- 
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by Tommaso Teofili <to...@gmail.com>.
2014-04-28 10:45 GMT+02:00 Ted Dunning <te...@gmail.com>:

> Edward,
>
> Sub-projects are generally frowned on.  Incubator projects can graduate
> eventually to a top-level project, but starting a project as a sub is not
> so good.  The incubator docs on this are pretty good reading.
>
> But if the committer community for your real-time BSP is (or can be) the
> same as the committer community for Hama itself, you might want to just
> fold this new code directly into Hama itself.  No need in that case for a
> separate project.
>
> If the committer community is very different, then a separate project is
> warranted.
>
> I don't think that the Hama development is such a high bandwidth thing that
> splitting is required.  To my mind that says that joining groups together
> is better than breaking them apart.  Together, the two efforts can feed off
> each other.  Apart, they could each run down due to lack of interest.
>

+1 to Ted's suggestion, we can create a separate directory on trunk for
Meerkat (or a branch) and eventually have separate release processes for
Hama core stuff and Meerkat stuff if that's needed.


>
> You will know much better than any of us the details of your communities.
>

as far as I can see Meerkat is being developed by Edward and another
committer so it should be relatively straightforward to let him/her join
the Hama community.

My 2 cents,
Tommaso


>
>
>
>
>
> On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <edwardyoon@apache.org
> >wrote:
>
> > Hi guys,
> >
> > As some people already might know, I'm recently working on real-time
> > data processing project on top of Hama BSP model, called Meerkat[1]
> > (currently only few developers (from two organizations, Kakaotalk and
> > DataSayer) are involved in this project).
> >
> > According to our internal study, Storm-like DAG-style and
> > fault-tolerant streaming processing framework can be implemented on
> > top of Hama BSP model. And, furthermore, we're thinking, it may be
> > possible to connect to other BSP applications using some smart input
> > and output formats in the future e.g., streaming graph or learning BSP
> > applications. Thus, we believe that we may be able to have an
> > next-generation architecture that processes and analyzes the data
> > rapidly in real-time, beyond complex old-style data collecting,
> > storing, ordering, processing, and analyzing architecture.
> >
> > Does it make sense to you? If so, I'd like to start to move from
> > github to ASF soon, although this is very *early* stage. Because, it's
> > highly related with Apache Hama and others. Hama sub-module or
> > sub-project, (or Apache incubator?). Which is best you think?
> >
> > I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
> > data OSS communities).
> >
> > 1. https://github.com/datasayer/meerkat
> >
> > --
> > Best Regards, Edward J. Yoon
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
> >
>

Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by "Edward J. Yoon" <ed...@apache.org>.
Thanks Ted. I generally agree with your suggestion.


On Mon, Apr 28, 2014 at 5:45 PM, Ted Dunning <te...@gmail.com> wrote:
> Edward,
>
> Sub-projects are generally frowned on.  Incubator projects can graduate
> eventually to a top-level project, but starting a project as a sub is not
> so good.  The incubator docs on this are pretty good reading.
>
> But if the committer community for your real-time BSP is (or can be) the
> same as the committer community for Hama itself, you might want to just
> fold this new code directly into Hama itself.  No need in that case for a
> separate project.
>
> If the committer community is very different, then a separate project is
> warranted.
>
> I don't think that the Hama development is such a high bandwidth thing that
> splitting is required.  To my mind that says that joining groups together
> is better than breaking them apart.  Together, the two efforts can feed off
> each other.  Apart, they could each run down due to lack of interest.
>
> You will know much better than any of us the details of your communities.
>
>
>
>
>
> On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <ed...@apache.org>wrote:
>
>> Hi guys,
>>
>> As some people already might know, I'm recently working on real-time
>> data processing project on top of Hama BSP model, called Meerkat[1]
>> (currently only few developers (from two organizations, Kakaotalk and
>> DataSayer) are involved in this project).
>>
>> According to our internal study, Storm-like DAG-style and
>> fault-tolerant streaming processing framework can be implemented on
>> top of Hama BSP model. And, furthermore, we're thinking, it may be
>> possible to connect to other BSP applications using some smart input
>> and output formats in the future e.g., streaming graph or learning BSP
>> applications. Thus, we believe that we may be able to have an
>> next-generation architecture that processes and analyzes the data
>> rapidly in real-time, beyond complex old-style data collecting,
>> storing, ordering, processing, and analyzing architecture.
>>
>> Does it make sense to you? If so, I'd like to start to move from
>> github to ASF soon, although this is very *early* stage. Because, it's
>> highly related with Apache Hama and others. Hama sub-module or
>> sub-project, (or Apache incubator?). Which is best you think?
>>
>> I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
>> data OSS communities).
>>
>> 1. https://github.com/datasayer/meerkat
>>
>> --
>> Best Regards, Edward J. Yoon
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>



-- 
Best Regards, Edward J. Yoon
CEO at DataSayer Co., Ltd.

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org


Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by Ted Dunning <te...@gmail.com>.
Edward,

Sub-projects are generally frowned on.  Incubator projects can graduate
eventually to a top-level project, but starting a project as a sub is not
so good.  The incubator docs on this are pretty good reading.

But if the committer community for your real-time BSP is (or can be) the
same as the committer community for Hama itself, you might want to just
fold this new code directly into Hama itself.  No need in that case for a
separate project.

If the committer community is very different, then a separate project is
warranted.

I don't think that the Hama development is such a high bandwidth thing that
splitting is required.  To my mind that says that joining groups together
is better than breaking them apart.  Together, the two efforts can feed off
each other.  Apart, they could each run down due to lack of interest.

You will know much better than any of us the details of your communities.





On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Hi guys,
>
> As some people already might know, I'm recently working on real-time
> data processing project on top of Hama BSP model, called Meerkat[1]
> (currently only few developers (from two organizations, Kakaotalk and
> DataSayer) are involved in this project).
>
> According to our internal study, Storm-like DAG-style and
> fault-tolerant streaming processing framework can be implemented on
> top of Hama BSP model. And, furthermore, we're thinking, it may be
> possible to connect to other BSP applications using some smart input
> and output formats in the future e.g., streaming graph or learning BSP
> applications. Thus, we believe that we may be able to have an
> next-generation architecture that processes and analyzes the data
> rapidly in real-time, beyond complex old-style data collecting,
> storing, ordering, processing, and analyzing architecture.
>
> Does it make sense to you? If so, I'd like to start to move from
> github to ASF soon, although this is very *early* stage. Because, it's
> highly related with Apache Hama and others. Hama sub-module or
> sub-project, (or Apache incubator?). Which is best you think?
>
> I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
> data OSS communities).
>
> 1. https://github.com/datasayer/meerkat
>
> --
> Best Regards, Edward J. Yoon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: [DISCUSS] Meerkat as a Hama sub-module or sub-project

Posted by Ted Dunning <te...@gmail.com>.
Edward,

Sub-projects are generally frowned on.  Incubator projects can graduate
eventually to a top-level project, but starting a project as a sub is not
so good.  The incubator docs on this are pretty good reading.

But if the committer community for your real-time BSP is (or can be) the
same as the committer community for Hama itself, you might want to just
fold this new code directly into Hama itself.  No need in that case for a
separate project.

If the committer community is very different, then a separate project is
warranted.

I don't think that the Hama development is such a high bandwidth thing that
splitting is required.  To my mind that says that joining groups together
is better than breaking them apart.  Together, the two efforts can feed off
each other.  Apart, they could each run down due to lack of interest.

You will know much better than any of us the details of your communities.





On Mon, Apr 28, 2014 at 4:23 AM, Edward J. Yoon <ed...@apache.org>wrote:

> Hi guys,
>
> As some people already might know, I'm recently working on real-time
> data processing project on top of Hama BSP model, called Meerkat[1]
> (currently only few developers (from two organizations, Kakaotalk and
> DataSayer) are involved in this project).
>
> According to our internal study, Storm-like DAG-style and
> fault-tolerant streaming processing framework can be implemented on
> top of Hama BSP model. And, furthermore, we're thinking, it may be
> possible to connect to other BSP applications using some smart input
> and output formats in the future e.g., streaming graph or learning BSP
> applications. Thus, we believe that we may be able to have an
> next-generation architecture that processes and analyzes the data
> rapidly in real-time, beyond complex old-style data collecting,
> storing, ordering, processing, and analyzing architecture.
>
> Does it make sense to you? If so, I'd like to start to move from
> github to ASF soon, although this is very *early* stage. Because, it's
> highly related with Apache Hama and others. Hama sub-module or
> sub-project, (or Apache incubator?). Which is best you think?
>
> I'm also CC'ing general@i.a.o to see more feedbacks (from Apache big
> data OSS communities).
>
> 1. https://github.com/datasayer/meerkat
>
> --
> Best Regards, Edward J. Yoon
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>