You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by Avery Ching <ac...@yahoo-inc.com> on 2011/07/15 20:14:53 UTC

[PROPOSAL] Proposing Giraph for the Apache Incubator

Hi,

I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.

Here is a link to the proposal in our GitHub wiki:

https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal

The proposal is also inlined below:

Thanks!

Avery



= Giraph : Large-scale graph processing on Hadoop =

== Abstract ==

Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.

== Proposal ==

Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.

== Background ==

Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.

== Rationale ==

Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.

== Initial Goals ==

At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.

== Current Status ==

=== Meritocracy ===

Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.

=== Community ===

Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.

=== Core Developers ===

Avery Ching: Wrote a majority of the code
Christian Kunz: Wrote most of the communication code and security integration with Hadoop

=== Alignment ===

Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.

== Known Risks ==

=== Orphaned products ===

There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.

=== Inexperience with Open Source ===

While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.

=== Homogenous Developers ===

At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.

=== Reliance on Salaried Developers ===

Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.

=== Relationships with Other Apache Products ===

Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.

Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.

=== An Excessive Fascination with the Apache Brand ===

The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.

=== Documentation ===

Currently we have little documentation, but several examples.  We are working on improving this situation.

=== Initial Source ===

The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.

=== Source and Intellectual Property Submission Plan ===

We intend the entire code base to be licensed under the Apache License, Version 2.0.

=== External Dependencies ===

The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
* JSON – Public Domain

=== Cryptography ===

Giraph depends on secure Hadoop that can optionally use Kerberos.

== Required Resources ==

=== Mailing lists ===

* giraph-private (with moderated subscriptions)
* giraph-dev
* giraph-commits
* giraph-users

=== Subversion Directory ===

https://svn.apache.org/repos/asf/incubator/giraph

=== Issue Tracking ===

JIRA Giraph (GIRAPH)

=== Other Resources ===

Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.

=== Initial Committers ===

Avery Ching, aching at yahoo-inc dot com
Christian Kunz, christian at jybe-inc dot com
Owen O’Malley, owen at hortonworks dot com

=== Affiliations ===

Avery Ching, Yahoo!
Christian Kunz, Jybe

== Sponsors ==

=== Champion ===

Owen O’ Malley

=== Nominated Mentors ===

Owen O’Malley

=== Sponsoring Entity ===

Apache Incubator PMC

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Phillip Rhodes <mo...@gmail.com>.

On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <ac...@yahoo-inc.com> wrote:

> Hi,
>
> I would like to propose Giraph as an Apache Incubator project.  Giraph is a
> large-scale graph processing infrastructure (inspired by Pregel) that runs
> entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on
> shared Hadoop instances and Giraph applications can be part of Oozie
> workflows as a normal MapReduce job.
>
> Here is a link to the proposal in our GitHub wiki:
>
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>
> The proposal is also inlined below:
>

+1

Additionally, I'd be happy to help with this project.  This touches on a
pretty specific area of interest for me personally.


Cheers,


Phil

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Owen O'Malley <om...@apache.org>.

On Fri, Jul 15, 2011 at 11:14 AM, Avery Ching <ac...@yahoo-inc.com> wrote:

> Hi,
>
> I would like to propose Giraph as an Apache Incubator project.


Obviously, I'm +1 for this. *smile*

-- Owen

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by "Edward J. Yoon" <ed...@apache.org>.

Based on that statement, I expect that if Giraph is accepted in the Apache Incubator, our projects will hopefully be able to share ideas and grow together.

+1

Sent from my iPhone

On 2011. 7. 16., at 오후 12:55, Avery Ching <ac...@yahoo-inc.com> wrote:

> 
> Based on that statement, I expect that if Giraph is accepted in the Apache Incubator, our projects will hopefully be able to share ideas and grow together.

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by "Edward J. Yoon" <ed...@apache.org>.

Based on that statement, I expect that if Giraph is accepted in the Apache Incubator, our projects will hopefully be able to share ideas and grow together.

+1

Sent from my iPhone

On 2011. 7. 16., at 오후 12:55, Avery Ching <ac...@yahoo-inc.com> wrote:

> 
> Based on that statement, I expect that if Giraph is accepted in the Apache Incubator, our projects will hopefully be able to share ideas and grow together.

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Avery Ching <ac...@yahoo-inc.com>.

Ed,

Offline, you and I have discussed potential future collaboration in our projects.  However, there are significant differences in our approaches today.

* Hama has been focused on BSP computing.  It only recently (June 30 - about 16 days ago) opened a JIRA for graph processing (https://issues.apache.org/jira/browse/HAMA-409).  Giraph has been focused on BSP-based graph processing from day one.
* Giraph runs entirely on the Hadoop infrastructure today.  It it meant to be used on shared Hadoop clusters and integrated as part of Oozie pipelines.  Today, Hama uses its own infrastructure and is pretty much a stand-alone system, only using HDFS.
* Giraph has focused on fault-tolerance and dynamic resource usage in a shared Hadoop cluster.  These are infrastructure-specific challenges that have a lot of value for our users and we will continue to focus and improve on this.

As we have discussed, things may change when next-gen Hadoop is released, however, that might take some time.  And when it is released, it will take some time for it to be stable enough to deploy to our installations.  I think it is productive for us to share ideas (as we have been doing), but also useful to have separate projects as they are different enough now and cater to a different set of users.

Even if these projects do overlap one day, under the incubator proposal guidelines (http://incubator.apache.org/guides/proposal.html) in the section 'Relationships with Other Apache Products', it reads:

"Apache allows different projects to have competing or overlapping goals. However, this should mean friendly competition between codebases and cordial cooperation between communities.

It is not always obvious whether a candidate is a direct competitor to an existing project, an indirect competitor (same problem space, different ecological niche) or are just peers with some overlap. In the case of indirect competition, it is important that the abstract describes accurately the niche. Direct competitors should expect to be asked to summarize architectural differences and similarities to existing projects."

Based on that statement, I expect that if Giraph is accepted in the Apache Incubator, our projects will hopefully be able to share ideas and grow together.

Thanks,

Avery

On Jul 15, 2011, at 4:29 PM, Edward J. Yoon wrote:

Just FYI,

My heavy concern is that the boundaries between 'Apache Hama' and
'Giraph' you said, can be collapsed in near future.

* Someone already contributed Pregel-like vertex API set on top of
Hama v0.2[1].
* The Hama job will be run on both Hama own cluster and Hadoop nextGen.

Then, BSP-based computing VS. BSP-based *only* graph computing, that's it.

Regarding this proposal, I'm +0.

Thanks,
Ed

1. https://issues.apache.org/jira/browse/HAMA-409

On Sat, Jul 16, 2011 at 3:14 AM, Avery Ching <ac...@yahoo-inc.com>> wrote:
Hi,

I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.

Here is a link to the proposal in our GitHub wiki:

https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal

The proposal is also inlined below:

Thanks!

Avery



= Giraph : Large-scale graph processing on Hadoop =

== Abstract ==

Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.

== Proposal ==

Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.

== Background ==

Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.

== Rationale ==

Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.

== Initial Goals ==

At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.

== Current Status ==

=== Meritocracy ===

Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.

=== Community ===

Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.

=== Core Developers ===

Avery Ching: Wrote a majority of the code
Christian Kunz: Wrote most of the communication code and security integration with Hadoop

=== Alignment ===

Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.

== Known Risks ==

=== Orphaned products ===

There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.

=== Inexperience with Open Source ===

While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.

=== Homogenous Developers ===

At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.

=== Reliance on Salaried Developers ===

Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.

=== Relationships with Other Apache Products ===

Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.

Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.

=== An Excessive Fascination with the Apache Brand ===

The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.

=== Documentation ===

Currently we have little documentation, but several examples.  We are working on improving this situation.

=== Initial Source ===

The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.

=== Source and Intellectual Property Submission Plan ===

We intend the entire code base to be licensed under the Apache License, Version 2.0.

=== External Dependencies ===

The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
* JSON – Public Domain

=== Cryptography ===

Giraph depends on secure Hadoop that can optionally use Kerberos.

== Required Resources ==

=== Mailing lists ===

* giraph-private (with moderated subscriptions)
* giraph-dev
* giraph-commits
* giraph-users

=== Subversion Directory ===

https://svn.apache.org/repos/asf/incubator/giraph

=== Issue Tracking ===

JIRA Giraph (GIRAPH)

=== Other Resources ===

Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.

=== Initial Committers ===

Avery Ching, aching at yahoo-inc dot com
Christian Kunz, christian at jybe-inc dot com
Owen O’Malley, owen at hortonworks dot com

=== Affiliations ===

Avery Ching, Yahoo!
Christian Kunz, Jybe

== Sponsors ==

=== Champion ===

Owen O’ Malley

=== Nominated Mentors ===

Owen O’Malley

=== Sponsoring Entity ===

Apache Incubator PMC




--
Best Regards, Edward J. Yoon
@eddieyoon

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org<ma...@incubator.apache.org>
For additional commands, e-mail: general-help@incubator.apache.org<ma...@incubator.apache.org>

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Avery Ching <ac...@yahoo-inc.com>.

Ed,

Offline, you and I have discussed potential future collaboration in our projects.  However, there are significant differences in our approaches today.

* Hama has been focused on BSP computing.  It only recently (June 30 - about 16 days ago) opened a JIRA for graph processing (https://issues.apache.org/jira/browse/HAMA-409).  Giraph has been focused on BSP-based graph processing from day one.
* Giraph runs entirely on the Hadoop infrastructure today.  It it meant to be used on shared Hadoop clusters and integrated as part of Oozie pipelines.  Today, Hama uses its own infrastructure and is pretty much a stand-alone system, only using HDFS.
* Giraph has focused on fault-tolerance and dynamic resource usage in a shared Hadoop cluster.  These are infrastructure-specific challenges that have a lot of value for our users and we will continue to focus and improve on this.

As we have discussed, things may change when next-gen Hadoop is released, however, that might take some time.  And when it is released, it will take some time for it to be stable enough to deploy to our installations.  I think it is productive for us to share ideas (as we have been doing), but also useful to have separate projects as they are different enough now and cater to a different set of users.

Even if these projects do overlap one day, under the incubator proposal guidelines (http://incubator.apache.org/guides/proposal.html) in the section 'Relationships with Other Apache Products', it reads:

"Apache allows different projects to have competing or overlapping goals. However, this should mean friendly competition between codebases and cordial cooperation between communities.

It is not always obvious whether a candidate is a direct competitor to an existing project, an indirect competitor (same problem space, different ecological niche) or are just peers with some overlap. In the case of indirect competition, it is important that the abstract describes accurately the niche. Direct competitors should expect to be asked to summarize architectural differences and similarities to existing projects."

Based on that statement, I expect that if Giraph is accepted in the Apache Incubator, our projects will hopefully be able to share ideas and grow together.

Thanks,

Avery

On Jul 15, 2011, at 4:29 PM, Edward J. Yoon wrote:

Just FYI,

My heavy concern is that the boundaries between 'Apache Hama' and
'Giraph' you said, can be collapsed in near future.

* Someone already contributed Pregel-like vertex API set on top of
Hama v0.2[1].
* The Hama job will be run on both Hama own cluster and Hadoop nextGen.

Then, BSP-based computing VS. BSP-based *only* graph computing, that's it.

Regarding this proposal, I'm +0.

Thanks,
Ed

1. https://issues.apache.org/jira/browse/HAMA-409

On Sat, Jul 16, 2011 at 3:14 AM, Avery Ching <ac...@yahoo-inc.com>> wrote:
Hi,

I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.

Here is a link to the proposal in our GitHub wiki:

https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal

The proposal is also inlined below:

Thanks!

Avery



= Giraph : Large-scale graph processing on Hadoop =

== Abstract ==

Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.

== Proposal ==

Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.

== Background ==

Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.

== Rationale ==

Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.

== Initial Goals ==

At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.

== Current Status ==

=== Meritocracy ===

Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.

=== Community ===

Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.

=== Core Developers ===

Avery Ching: Wrote a majority of the code
Christian Kunz: Wrote most of the communication code and security integration with Hadoop

=== Alignment ===

Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.

== Known Risks ==

=== Orphaned products ===

There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.

=== Inexperience with Open Source ===

While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.

=== Homogenous Developers ===

At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.

=== Reliance on Salaried Developers ===

Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.

=== Relationships with Other Apache Products ===

Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.

Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.

=== An Excessive Fascination with the Apache Brand ===

The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.

=== Documentation ===

Currently we have little documentation, but several examples.  We are working on improving this situation.

=== Initial Source ===

The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.

=== Source and Intellectual Property Submission Plan ===

We intend the entire code base to be licensed under the Apache License, Version 2.0.

=== External Dependencies ===

The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
* JSON – Public Domain

=== Cryptography ===

Giraph depends on secure Hadoop that can optionally use Kerberos.

== Required Resources ==

=== Mailing lists ===

* giraph-private (with moderated subscriptions)
* giraph-dev
* giraph-commits
* giraph-users

=== Subversion Directory ===

https://svn.apache.org/repos/asf/incubator/giraph

=== Issue Tracking ===

JIRA Giraph (GIRAPH)

=== Other Resources ===

Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.

=== Initial Committers ===

Avery Ching, aching at yahoo-inc dot com
Christian Kunz, christian at jybe-inc dot com
Owen O’Malley, owen at hortonworks dot com

=== Affiliations ===

Avery Ching, Yahoo!
Christian Kunz, Jybe

== Sponsors ==

=== Champion ===

Owen O’ Malley

=== Nominated Mentors ===

Owen O’Malley

=== Sponsoring Entity ===

Apache Incubator PMC




--
Best Regards, Edward J. Yoon
@eddieyoon

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org<ma...@incubator.apache.org>
For additional commands, e-mail: general-help@incubator.apache.org<ma...@incubator.apache.org>

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by "Edward J. Yoon" <ed...@apache.org>.

Just FYI,

My heavy concern is that the boundaries between 'Apache Hama' and
'Giraph' you said, can be collapsed in near future.

 * Someone already contributed Pregel-like vertex API set on top of
Hama v0.2[1].
 * The Hama job will be run on both Hama own cluster and Hadoop nextGen.

Then, BSP-based computing VS. BSP-based *only* graph computing, that's it.

Regarding this proposal, I'm +0.

Thanks,
Ed

1. https://issues.apache.org/jira/browse/HAMA-409

On Sat, Jul 16, 2011 at 3:14 AM, Avery Ching <ac...@yahoo-inc.com> wrote:
> Hi,
>
> I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.
>
> Here is a link to the proposal in our GitHub wiki:
>
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>
> The proposal is also inlined below:
>
> Thanks!
>
> Avery
>
>
>
> = Giraph : Large-scale graph processing on Hadoop =
>
> == Abstract ==
>
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
>
> == Proposal ==
>
> Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.
>
> == Background ==
>
> Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.
>
> == Rationale ==
>
> Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
>
> == Initial Goals ==
>
> At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.
>
> == Current Status ==
>
> === Meritocracy ===
>
> Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.
>
> === Community ===
>
> Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.
>
> === Core Developers ===
>
> Avery Ching: Wrote a majority of the code
> Christian Kunz: Wrote most of the communication code and security integration with Hadoop
>
> === Alignment ===
>
> Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.
>
> === Inexperience with Open Source ===
>
> While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.
>
> === Homogenous Developers ===
>
> At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.
>
> === Reliance on Salaried Developers ===
>
> Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.
>
> === Relationships with Other Apache Products ===
>
> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.
>
> Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>
> === An Excessive Fascination with the Apache Brand ===
>
> The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.
>
> === Documentation ===
>
> Currently we have little documentation, but several examples.  We are working on improving this situation.
>
> === Initial Source ===
>
> The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>
> === Source and Intellectual Property Submission Plan ===
>
> We intend the entire code base to be licensed under the Apache License, Version 2.0.
>
> === External Dependencies ===
>
> The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
> * JSON – Public Domain
>
> === Cryptography ===
>
> Giraph depends on secure Hadoop that can optionally use Kerberos.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> * giraph-private (with moderated subscriptions)
> * giraph-dev
> * giraph-commits
> * giraph-users
>
> === Subversion Directory ===
>
> https://svn.apache.org/repos/asf/incubator/giraph
>
> === Issue Tracking ===
>
> JIRA Giraph (GIRAPH)
>
> === Other Resources ===
>
> Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.
>
> === Initial Committers ===
>
> Avery Ching, aching at yahoo-inc dot com
> Christian Kunz, christian at jybe-inc dot com
> Owen O’Malley, owen at hortonworks dot com
>
> === Affiliations ===
>
> Avery Ching, Yahoo!
> Christian Kunz, Jybe
>
> == Sponsors ==
>
> === Champion ===
>
> Owen O’ Malley
>
> === Nominated Mentors ===
>
> Owen O’Malley
>
> === Sponsoring Entity ===
>
> Apache Incubator PMC
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by "Edward J. Yoon" <ed...@apache.org>.

Just FYI,

My heavy concern is that the boundaries between 'Apache Hama' and
'Giraph' you said, can be collapsed in near future.

 * Someone already contributed Pregel-like vertex API set on top of
Hama v0.2[1].
 * The Hama job will be run on both Hama own cluster and Hadoop nextGen.

Then, BSP-based computing VS. BSP-based *only* graph computing, that's it.

Regarding this proposal, I'm +0.

Thanks,
Ed

1. https://issues.apache.org/jira/browse/HAMA-409

On Sat, Jul 16, 2011 at 3:14 AM, Avery Ching <ac...@yahoo-inc.com> wrote:
> Hi,
>
> I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.
>
> Here is a link to the proposal in our GitHub wiki:
>
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>
> The proposal is also inlined below:
>
> Thanks!
>
> Avery
>
>
>
> = Giraph : Large-scale graph processing on Hadoop =
>
> == Abstract ==
>
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
>
> == Proposal ==
>
> Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.
>
> == Background ==
>
> Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.
>
> == Rationale ==
>
> Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
>
> == Initial Goals ==
>
> At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.
>
> == Current Status ==
>
> === Meritocracy ===
>
> Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.
>
> === Community ===
>
> Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.
>
> === Core Developers ===
>
> Avery Ching: Wrote a majority of the code
> Christian Kunz: Wrote most of the communication code and security integration with Hadoop
>
> === Alignment ===
>
> Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.
>
> === Inexperience with Open Source ===
>
> While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.
>
> === Homogenous Developers ===
>
> At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.
>
> === Reliance on Salaried Developers ===
>
> Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.
>
> === Relationships with Other Apache Products ===
>
> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.
>
> Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>
> === An Excessive Fascination with the Apache Brand ===
>
> The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.
>
> === Documentation ===
>
> Currently we have little documentation, but several examples.  We are working on improving this situation.
>
> === Initial Source ===
>
> The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>
> === Source and Intellectual Property Submission Plan ===
>
> We intend the entire code base to be licensed under the Apache License, Version 2.0.
>
> === External Dependencies ===
>
> The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
> * JSON – Public Domain
>
> === Cryptography ===
>
> Giraph depends on secure Hadoop that can optionally use Kerberos.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> * giraph-private (with moderated subscriptions)
> * giraph-dev
> * giraph-commits
> * giraph-users
>
> === Subversion Directory ===
>
> https://svn.apache.org/repos/asf/incubator/giraph
>
> === Issue Tracking ===
>
> JIRA Giraph (GIRAPH)
>
> === Other Resources ===
>
> Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.
>
> === Initial Committers ===
>
> Avery Ching, aching at yahoo-inc dot com
> Christian Kunz, christian at jybe-inc dot com
> Owen O’Malley, owen at hortonworks dot com
>
> === Affiliations ===
>
> Avery Ching, Yahoo!
> Christian Kunz, Jybe
>
> == Sponsors ==
>
> === Champion ===
>
> Owen O’ Malley
>
> === Nominated Mentors ===
>
> Owen O’Malley
>
> === Sponsoring Entity ===
>
> Apache Incubator PMC
>



-- 
Best Regards, Edward J. Yoon
@eddieyoon

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Hyunsik Choi <hy...@apache.org>.

I read the proposal of Giraph. That's very interesting! I have some
experiences in graph processing with MapReduce. I think that the approach of
Giraph is very promising since MapReduce already is regarded as the de-facto
standards in processing large data and is appropriate to many graph
algorithms. If there is a general graph package in MapReduce, it will be
widely used in many areas. I would like to participate in Giraph project.

Best regards,
Hyunsik Choi


Avery Ching wrote:
> 
> Hi,
> 
> I would like to propose Giraph as an Apache Incubator project.  Giraph is
> a large-scale graph processing infrastructure (inspired by Pregel) that
> runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist
> on shared Hadoop instances and Giraph applications can be part of Oozie
> workflows as a normal MapReduce job.
> 
> Here is a link to the proposal in our GitHub wiki:
> 
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
> 
> The proposal is also inlined below:
> 
> Thanks!
> 
> Avery
> 
> 
> 
> = Giraph : Large-scale graph processing on Hadoop =
> 
> == Abstract ==
> 
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel
> (BSP)-based graph processing framework.
> 
> == Proposal ==
> 
> Graph processing platforms to run large-scale algorithms (such as page
> rank, shared connections, personalization-based popularity, etc.) have
> become quite popular.  Some recent examples include Pregel and HaLoop. 
> For general-purpose big data computation, the MapReduce computation model
> is widely adopted and the most deployed MapReduce infrastructure is Apache
> Hadoop.  We have implemented a graph-processing framework that is launched
> as a typical Hadoop MapReduce job to leverage existing Hadoop
> infrastructure, such as Amazon’s EC2.  Giraph builds upon the
> graph-oriented nature of Pregel but additionally adds fault-tolerance to
> the coordinator process with the use of ZooKeeper as its centralized
> coordination service.  Additionally, Giraph will include a library of
> generic graph algorithms.
> 
> == Background ==
> 
> Giraph was initially began development as a side project at Yahoo! at the
> end of 2010.  It was made functional in a month and then started adding
> various features.  Development has been focused on internal customers
> needs until this point.
> 
> == Rationale ==
> 
> Web and online social graphs have been rapidly growing in size and scale
> during the past decade.  In 2008, Google estimated that the number of web
> pages reached over a trillion.  Online social networking and email sites,
> including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have
> hundreds of millions of users and are expected to grow much more in the
> future.  Processing these graphs plays a big role in relevant and
> personalized information for users, such as results from a search engine
> or news in an online social networking site.
> 
> == Initial Goals ==
> 
> At this point, most of the functionality has been implemented and we are
> looking to get more adoption and contributions from users outside Yahoo!.  
> We want to ensure that performance scales and that the code is robust and
> fault tolerant.
> 
> == Current Status ==
> 
> === Meritocracy ===
> 
> Giraph was initially developed by Avery Ching and Christian Kunz beginning
> in December 2010 at Yahoo!.  There are other developers using Giraph at
> Yahoo! that are making suggestions and adding code.  We are reaching out
> to other folks at social networking companies for additional usage and
> development.
> 
> === Community ===
> 
> Several groups who are interested in either joining our project or using
> our code have contacted us.  We certainly believe that there is a lot of
> interest and are actively looking to improve and expand the community.
> 
> === Core Developers ===
> 
> Avery Ching: Wrote a majority of the code
> Christian Kunz: Wrote most of the communication code and security
> integration with Hadoop
> 
> === Alignment ===
> 
> Giraph uses several Apache projects as its underlying infrastructure
> (Hadoop and ZooKeeper).   It also builds on Apache Maven.
> 
> == Known Risks ==
> 
> === Orphaned products ===
> 
> There are many social networking companies that would be interested in
> using this graph-processing framework and we have already received
> interest from some of them.  Yahoo! is already using this code in
> production and will certainly continue to use it in the future as well.
> 
> === Inexperience with Open Source ===
> 
> While the initial developers have limited experience on contributing to
> open-source projects, Yahoo! as a company has a strong commitment to
> open-source and we have several advisors that we can ask for help.
> 
> === Homogenous Developers ===
> 
> At this time, the project is relatively young and the developers work at
> only two companies (Yahoo! and Jybe).  However, given the interest we have
> seen in the project, we expect the diversity to improve in the near
> future.
> 
> === Reliance on Salaried Developers ===
> 
> Currently Giraph is being developed by a combination of salaried and
> volunteer time.  We expect that other corporations will take an interest
> in this project and likely contribute with salaried developers.  Some
> individuals will likely spend volunteer time on it as well.  It is still
> early in their project and we are hoping for a lot of growth.
> 
> === Relationships with Other Apache Products ===
> 
> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons,
> etc.  It is built using Apache Maven.
> 
> Giraph has some overlapping functionality with Apache Hama.  However,
> there are some significant differences.  Giraph focuses on graph-based
> bulk synchronous parallel (BSP) computing, while Apache Hama is more for
> general purposed BSP computing.  Giraph runs on the Hadoop infrastructure,
> while Apache Hama uses its own computing framework.
> 
> === An Excessive Fascination with the Apache Brand ===
> 
> The Apache brand is likely to help us find contributors, however, our
> interests in Apache are primarily because the other projects that we
> depend on are also Apache projects and it makes sense that all this
> software be available from the same place.
> 
> === Documentation ===
> 
> Currently we have little documentation, but several examples.  We are
> working on improving this situation.
> 
> === Initial Source ===
> 
> The initial source of the code is from Yahoo! and began development in
> December 2010.  It is already available on GitHub at
> https://github.com/aching/Giraph.
> 
> === Source and Intellectual Property Submission Plan ===
> 
> We intend the entire code base to be licensed under the Apache License,
> Version 2.0.
> 
> === External Dependencies ===
> 
> The required dependencies are all Apache compatible licenses.  The
> following components with non-Apache licenses are enumerated:
> * JSON – Public Domain
> 
> === Cryptography ===
> 
> Giraph depends on secure Hadoop that can optionally use Kerberos.
> 
> == Required Resources ==
> 
> === Mailing lists ===
> 
> * giraph-private (with moderated subscriptions)
> * giraph-dev
> * giraph-commits
> * giraph-users
> 
> === Subversion Directory ===
> 
> https://svn.apache.org/repos/asf/incubator/giraph
> 
> === Issue Tracking ===
> 
> JIRA Giraph (GIRAPH)
> 
> === Other Resources ===
> 
> Giraph has integration tests that can be run with the LocalJobRunner. 
> These same tests also designed to be run on a small (even single node)
> Hadoop cluster.  While not required at this time, it would be nice if such
> a resource were available.
> 
> === Initial Committers ===
> 
> Avery Ching, aching at yahoo-inc dot com
> Christian Kunz, christian at jybe-inc dot com
> Owen O’Malley, owen at hortonworks dot com
> 
> === Affiliations ===
> 
> Avery Ching, Yahoo!
> Christian Kunz, Jybe
> 
> == Sponsors ==
> 
> === Champion ===
> 
> Owen O’ Malley
> 
> === Nominated Mentors ===
> 
> Owen O’Malley
> 
> === Sponsoring Entity ===
> 
> Apache Incubator PMC
> 
> 

-- 
View this message in context: http://old.nabble.com/-PROPOSAL--Proposing-Giraph-for-the-Apache-Incubator-tp32070326p32078057.html
Sent from the Apache Incubator - General mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Avery Ching <ac...@yahoo-inc.com>.

Mohammed,

Thank you for looking at the proposal.  I added the twiki:

http://wiki.apache.org/incubator/GiraphProposal

Avery

On Jul 17, 2011, at 12:03 AM, Mohammad Nour El-Din wrote:

+1 on the proposal

And for the point raised by Edward, I agree with his concerns, but
also it is up to the mentors and development team to manage this kind
of syncing between them and other Apache projects.

For the putting the proposal on the Incubator wiki, Avery would you
please do that ? :)

On Sat, Jul 16, 2011 at 6:42 AM, Phillip Rhodes
<mo...@gmail.com>> wrote:
On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <ac...@yahoo-inc.com>> wrote:

Hi,

Here is a link to the proposal in our GitHub wiki:

https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal


Should the contents of this proposal to be copied to the incubator section
of the ASF wiki, at something like:

http://wiki.apache.org/incubator/GiraphProposal


Cheers,


Phil


<http://wiki.apache.org/incubator/GiraphProposal>




--
Thanks
- Mohammad Nour
  Author of (WebSphere Application Server Community Edition 2.0 User Guide)
  http://www.redbooks.ibm.com/abstracts/sg247585.html
- LinkedIn: http://www.linkedin.com/in/mnour
- Blog: http://tadabborat.blogspot.com
----
"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein

"Writing clean code is what you must do in order to call yourself a
professional. There is no reasonable excuse for doing anything less
than your best."
- Clean Code: A Handbook of Agile Software Craftsmanship

"Stay hungry, stay foolish."
- Steve Jobs

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org<ma...@incubator.apache.org>
For additional commands, e-mail: general-help@incubator.apache.org<ma...@incubator.apache.org>

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Mohammad Nour El-Din <no...@gmail.com>.

+1 on the proposal

And for the point raised by Edward, I agree with his concerns, but
also it is up to the mentors and development team to manage this kind
of syncing between them and other Apache projects.

For the putting the proposal on the Incubator wiki, Avery would you
please do that ? :)

On Sat, Jul 16, 2011 at 6:42 AM, Phillip Rhodes
<mo...@gmail.com> wrote:
> On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <ac...@yahoo-inc.com> wrote:
>
>> Hi,
>>
>> Here is a link to the proposal in our GitHub wiki:
>>
>> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>>
>>
> Should the contents of this proposal to be copied to the incubator section
> of the ASF wiki, at something like:
>
> http://wiki.apache.org/incubator/GiraphProposal
>
>
> Cheers,
>
>
> Phil
>
>
> <http://wiki.apache.org/incubator/GiraphProposal>
>



-- 
Thanks
- Mohammad Nour
  Author of (WebSphere Application Server Community Edition 2.0 User Guide)
  http://www.redbooks.ibm.com/abstracts/sg247585.html
- LinkedIn: http://www.linkedin.com/in/mnour
- Blog: http://tadabborat.blogspot.com
----
"Life is like riding a bicycle. To keep your balance you must keep moving"
- Albert Einstein

"Writing clean code is what you must do in order to call yourself a
professional. There is no reasonable excuse for doing anything less
than your best."
- Clean Code: A Handbook of Agile Software Craftsmanship

"Stay hungry, stay foolish."
- Steve Jobs

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Jakob Homan <jg...@gmail.com>.

We've quite interested in helping to develop Giraph and have been
experimenting with it here.  I'd like to add myself to the initial
list of committers, if there are no objections.  My background: Apache
Hadoop committer and PMC member, committer on Apache Kafka in
Incubator and contributor to other Hadoop ecosystem projects.

Thanks,
Jakob




On Mon, Jul 18, 2011 at 5:45 PM, Hyunsik Choi <hy...@apache.org> wrote:
> I have experiences about graph processing as follows:
> * RDF pattern matching and storage system on HBase
> * some types of graph counting
> * subgraph isomorphism on large graph data with MapReduce (I'm writing a
> research paper on this)
> * all pairs of shortest paths with CUDA
>
> I have just added myself to the list of "initial committers" on the wiki.
>
> Cheers,
> --
> Hyunsik Choi
>
> On Tue, Jul 19, 2011 at 3:33 AM, Phillip Rhodes
> <mo...@gmail.com>wrote:
>
>> Cool.  I've added myself to the list of  "Initial Committers" on the wiki.
>>  (As I understand it, during the proposal
>> phase of an incubator project, anyone can jump in and volunteer to be a
>> committer.  But if anyone wants to know
>> more about me or my background, feel free to ask).
>>
>>
>> Phil
>>
>> On Sun, Jul 17, 2011 at 7:52 AM, Avery Ching <ac...@yahoo-inc.com> wrote:
>>
>> > Phillip,
>> >
>> > Thank you for your suggestion.  I've added the proposal as you suggested
>> to
>> > the Apache wiki.
>> >
>> > http://wiki.apache.org/incubator/GiraphProposal
>> >
>> > I'm glad to hear the project interests you and hope you get some time to
>> > take a look at it.
>> >
>> > Avery
>> >
>> > On Jul 15, 2011, at 6:42 PM, Phillip Rhodes wrote:
>> >
>> > On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <aching@yahoo-inc.com
>> <mailto:
>> > aching@yahoo-inc.com>> wrote:
>> >
>> > Hi,
>> >
>> > Here is a link to the proposal in our GitHub wiki:
>> >
>> > https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>> >
>> >
>> > Should the contents of this proposal to be copied to the incubator
>> section
>> > of the ASF wiki, at something like:
>> >
>> > http://wiki.apache.org/incubator/GiraphProposal
>> >
>> >
>> > Cheers,
>> >
>> >
>> > Phil
>> >
>> >
>> > <http://wiki.apache.org/incubator/GiraphProposal>
>> >
>> >
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Hyunsik Choi <hy...@apache.org>.

I have experiences about graph processing as follows:
* RDF pattern matching and storage system on HBase
* some types of graph counting
* subgraph isomorphism on large graph data with MapReduce (I'm writing a
research paper on this)
* all pairs of shortest paths with CUDA

I have just added myself to the list of "initial committers" on the wiki.

Cheers,
--
Hyunsik Choi

On Tue, Jul 19, 2011 at 3:33 AM, Phillip Rhodes
<mo...@gmail.com>wrote:

> Cool.  I've added myself to the list of  "Initial Committers" on the wiki.
>  (As I understand it, during the proposal
> phase of an incubator project, anyone can jump in and volunteer to be a
> committer.  But if anyone wants to know
> more about me or my background, feel free to ask).
>
>
> Phil
>
> On Sun, Jul 17, 2011 at 7:52 AM, Avery Ching <ac...@yahoo-inc.com> wrote:
>
> > Phillip,
> >
> > Thank you for your suggestion.  I've added the proposal as you suggested
> to
> > the Apache wiki.
> >
> > http://wiki.apache.org/incubator/GiraphProposal
> >
> > I'm glad to hear the project interests you and hope you get some time to
> > take a look at it.
> >
> > Avery
> >
> > On Jul 15, 2011, at 6:42 PM, Phillip Rhodes wrote:
> >
> > On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <aching@yahoo-inc.com
> <mailto:
> > aching@yahoo-inc.com>> wrote:
> >
> > Hi,
> >
> > Here is a link to the proposal in our GitHub wiki:
> >
> > https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
> >
> >
> > Should the contents of this proposal to be copied to the incubator
> section
> > of the ASF wiki, at something like:
> >
> > http://wiki.apache.org/incubator/GiraphProposal
> >
> >
> > Cheers,
> >
> >
> > Phil
> >
> >
> > <http://wiki.apache.org/incubator/GiraphProposal>
> >
> >
>

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Phillip Rhodes <mo...@gmail.com>.

Cool.  I've added myself to the list of  "Initial Committers" on the wiki.
 (As I understand it, during the proposal
phase of an incubator project, anyone can jump in and volunteer to be a
committer.  But if anyone wants to know
more about me or my background, feel free to ask).


Phil

On Sun, Jul 17, 2011 at 7:52 AM, Avery Ching <ac...@yahoo-inc.com> wrote:

> Phillip,
>
> Thank you for your suggestion.  I've added the proposal as you suggested to
> the Apache wiki.
>
> http://wiki.apache.org/incubator/GiraphProposal
>
> I'm glad to hear the project interests you and hope you get some time to
> take a look at it.
>
> Avery
>
> On Jul 15, 2011, at 6:42 PM, Phillip Rhodes wrote:
>
> On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <aching@yahoo-inc.com<mailto:
> aching@yahoo-inc.com>> wrote:
>
> Hi,
>
> Here is a link to the proposal in our GitHub wiki:
>
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>
>
> Should the contents of this proposal to be copied to the incubator section
> of the ASF wiki, at something like:
>
> http://wiki.apache.org/incubator/GiraphProposal
>
>
> Cheers,
>
>
> Phil
>
>
> <http://wiki.apache.org/incubator/GiraphProposal>
>
>

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Avery Ching <ac...@yahoo-inc.com>.

Phillip,

Thank you for your suggestion.  I've added the proposal as you suggested to the Apache wiki.

http://wiki.apache.org/incubator/GiraphProposal

I'm glad to hear the project interests you and hope you get some time to take a look at it.

Avery

On Jul 15, 2011, at 6:42 PM, Phillip Rhodes wrote:

On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <ac...@yahoo-inc.com>> wrote:

Hi,

Here is a link to the proposal in our GitHub wiki:

https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal


Should the contents of this proposal to be copied to the incubator section
of the ASF wiki, at something like:

http://wiki.apache.org/incubator/GiraphProposal


Cheers,


Phil


<http://wiki.apache.org/incubator/GiraphProposal>

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Phillip Rhodes <mo...@gmail.com>.

On Fri, Jul 15, 2011 at 2:14 PM, Avery Ching <ac...@yahoo-inc.com> wrote:

> Hi,
>
> Here is a link to the proposal in our GitHub wiki:
>
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>
>
Should the contents of this proposal to be copied to the incubator section
of the ASF wiki, at something like:

http://wiki.apache.org/incubator/GiraphProposal


Cheers,


Phil


<http://wiki.apache.org/incubator/GiraphProposal>

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Avery Ching <ac...@yahoo-inc.com>.

Hello,

Thank you for taking a look at the proposal and for your questions.  I've responded to your questions inline.

Avery

On Jul 23, 2011, at 5:38 AM, florent andré wrote:

> Hi Avery,
> 
> Be careful, newbie here ! :)
> 
> I read your proposal with attention and also this presentation [1].
> 
> So my questions are :
> - What are the differences / similiarities between Giraph and triples 
> store like Jena ?

Giraph is graph processing infrastructure that has input and output formats for the user to denote how to load/store their graph.  Users are free to use whatever backend store they would like (i.e. HDFS, HBase, neo4j, a triple store, etc.).

> - Does Giraph provide (or will provide) a convenient way to "request / 
> query" graph (like sparql for example) ?
> 

Giraph is not a query language.  It is meant to run large-scale graph algorithms.

> May they are silly questions, but from a 100 feet point of view both are 
> about graph processing...and surely have a big difference I can't see 
> with my babies eyes...
> 
> Thank for your insights
> Have a good path with Giraph ! :)
> 

No problem.  Let me know if you have any other questions.

> [1] http://www.slideshare.net/averyching/20110628giraph-hadoop-summit
> 
> On 07/15/2011 08:14 PM, Avery Ching wrote:
>> Hi,
>> 
>> I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.
>> 
>> Here is a link to the proposal in our GitHub wiki:
>> 
>> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>> 
>> The proposal is also inlined below:
>> 
>> Thanks!
>> 
>> Avery
>> 
>> 
>> 
>> = Giraph : Large-scale graph processing on Hadoop =
>> 
>> == Abstract ==
>> 
>> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
>> 
>> == Proposal ==
>> 
>> Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.
>> 
>> == Background ==
>> 
>> Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.
>> 
>> == Rationale ==
>> 
>> Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
>> 
>> == Initial Goals ==
>> 
>> At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.
>> 
>> == Current Status ==
>> 
>> === Meritocracy ===
>> 
>> Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.
>> 
>> === Community ===
>> 
>> Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.
>> 
>> === Core Developers ===
>> 
>> Avery Ching: Wrote a majority of the code
>> Christian Kunz: Wrote most of the communication code and security integration with Hadoop
>> 
>> === Alignment ===
>> 
>> Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.
>> 
>> == Known Risks ==
>> 
>> === Orphaned products ===
>> 
>> There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.
>> 
>> === Inexperience with Open Source ===
>> 
>> While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.
>> 
>> === Homogenous Developers ===
>> 
>> At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.
>> 
>> === Reliance on Salaried Developers ===
>> 
>> Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.
>> 
>> === Relationships with Other Apache Products ===
>> 
>> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.
>> 
>> Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>> 
>> === An Excessive Fascination with the Apache Brand ===
>> 
>> The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.
>> 
>> === Documentation ===
>> 
>> Currently we have little documentation, but several examples.  We are working on improving this situation.
>> 
>> === Initial Source ===
>> 
>> The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>> 
>> === Source and Intellectual Property Submission Plan ===
>> 
>> We intend the entire code base to be licensed under the Apache License, Version 2.0.
>> 
>> === External Dependencies ===
>> 
>> The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
>> * JSON – Public Domain
>> 
>> === Cryptography ===
>> 
>> Giraph depends on secure Hadoop that can optionally use Kerberos.
>> 
>> == Required Resources ==
>> 
>> === Mailing lists ===
>> 
>> * giraph-private (with moderated subscriptions)
>> * giraph-dev
>> * giraph-commits
>> * giraph-users
>> 
>> === Subversion Directory ===
>> 
>> https://svn.apache.org/repos/asf/incubator/giraph
>> 
>> === Issue Tracking ===
>> 
>> JIRA Giraph (GIRAPH)
>> 
>> === Other Resources ===
>> 
>> Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.
>> 
>> === Initial Committers ===
>> 
>> Avery Ching, aching at yahoo-inc dot com
>> Christian Kunz, christian at jybe-inc dot com
>> Owen O’Malley, owen at hortonworks dot com
>> 
>> === Affiliations ===
>> 
>> Avery Ching, Yahoo!
>> Christian Kunz, Jybe
>> 
>> == Sponsors ==
>> 
>> === Champion ===
>> 
>> Owen O’ Malley
>> 
>> === Nominated Mentors ===
>> 
>> Owen O’Malley
>> 
>> === Sponsoring Entity ===
>> 
>> Apache Incubator PMC
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by florent andré <fl...@4sengines.com>.

Hi Avery,

Be careful, newbie here ! :)

I read your proposal with attention and also this presentation [1].

So my questions are :
- What are the differences / similiarities between Giraph and triples 
store like Jena ?
- Does Giraph provide (or will provide) a convenient way to "request / 
query" graph (like sparql for example) ?

May they are silly questions, but from a 100 feet point of view both are 
about graph processing...and surely have a big difference I can't see 
with my babies eyes...

Thank for your insights
Have a good path with Giraph ! :)

[1] http://www.slideshare.net/averyching/20110628giraph-hadoop-summit

On 07/15/2011 08:14 PM, Avery Ching wrote:
> Hi,
>
> I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.
>
> Here is a link to the proposal in our GitHub wiki:
>
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>
> The proposal is also inlined below:
>
> Thanks!
>
> Avery
>
>
>
> = Giraph : Large-scale graph processing on Hadoop =
>
> == Abstract ==
>
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
>
> == Proposal ==
>
> Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.
>
> == Background ==
>
> Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.
>
> == Rationale ==
>
> Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
>
> == Initial Goals ==
>
> At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.
>
> == Current Status ==
>
> === Meritocracy ===
>
> Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.
>
> === Community ===
>
> Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.
>
> === Core Developers ===
>
> Avery Ching: Wrote a majority of the code
> Christian Kunz: Wrote most of the communication code and security integration with Hadoop
>
> === Alignment ===
>
> Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.
>
> === Inexperience with Open Source ===
>
> While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.
>
> === Homogenous Developers ===
>
> At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.
>
> === Reliance on Salaried Developers ===
>
> Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.
>
> === Relationships with Other Apache Products ===
>
> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.
>
> Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>
> === An Excessive Fascination with the Apache Brand ===
>
> The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.
>
> === Documentation ===
>
> Currently we have little documentation, but several examples.  We are working on improving this situation.
>
> === Initial Source ===
>
> The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>
> === Source and Intellectual Property Submission Plan ===
>
> We intend the entire code base to be licensed under the Apache License, Version 2.0.
>
> === External Dependencies ===
>
> The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
> * JSON – Public Domain
>
> === Cryptography ===
>
> Giraph depends on secure Hadoop that can optionally use Kerberos.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> * giraph-private (with moderated subscriptions)
> * giraph-dev
> * giraph-commits
> * giraph-users
>
> === Subversion Directory ===
>
> https://svn.apache.org/repos/asf/incubator/giraph
>
> === Issue Tracking ===
>
> JIRA Giraph (GIRAPH)
>
> === Other Resources ===
>
> Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.
>
> === Initial Committers ===
>
> Avery Ching, aching at yahoo-inc dot com
> Christian Kunz, christian at jybe-inc dot com
> Owen O’Malley, owen at hortonworks dot com
>
> === Affiliations ===
>
> Avery Ching, Yahoo!
> Christian Kunz, Jybe
>
> == Sponsors ==
>
> === Champion ===
>
> Owen O’ Malley
>
> === Nominated Mentors ===
>
> Owen O’Malley
>
> === Sponsoring Entity ===
>
> Apache Incubator PMC
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.

Sounds good to me. Thanks for your reply Avery.

- Henry

On Thu, Jul 21, 2011 at 4:39 PM, Avery Ching <ac...@yahoo-inc.com> wrote:
> Henry,
>
> While we haven't begun too much work on a generic library, the intent is to provide generic vertex input/output formats, aggregators, combiners, and graph computations that make it very easy for a user to get started right away.  None of these need to be explicitly integrated with Hadoop or Hadoop objects.  That being said, we provide users the ability to use existing Hadoop Writable implementations, such as IntWritable, FloatWritable, etc. to make their lives easier rather than reimplementing those basic types.  Similarly, the methods of VertexInputFormat/VertexOutputFormat need not be implemented using an underlying Hadoop InputFormat/OutputFormat, but they are similar to make it easy to do so if desired.
>
> Hope that answers your question,
>
> Avery
>
> On Jul 21, 2011, at 4:09 PM, Henry Saputra wrote:
>
>> Will the library generic graph algorithm be tightly coupled with the
>> Hadoop integration piece?
>>
>> - Henry
>>
>> On Fri, Jul 15, 2011 at 11:14 AM, Avery Ching <ac...@yahoo-inc.com> wrote:
>>> Hi,
>>>
>>> I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.
>>>
>>> Here is a link to the proposal in our GitHub wiki:
>>>
>>> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>>>
>>> The proposal is also inlined below:
>>>
>>> Thanks!
>>>
>>> Avery
>>>
>>>
>>>
>>> = Giraph : Large-scale graph processing on Hadoop =
>>>
>>> == Abstract ==
>>>
>>> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
>>>
>>> == Proposal ==
>>>
>>> Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.
>>>
>>> == Background ==
>>>
>>> Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.
>>>
>>> == Rationale ==
>>>
>>> Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
>>>
>>> == Initial Goals ==
>>>
>>> At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.
>>>
>>> == Current Status ==
>>>
>>> === Meritocracy ===
>>>
>>> Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.
>>>
>>> === Community ===
>>>
>>> Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.
>>>
>>> === Core Developers ===
>>>
>>> Avery Ching: Wrote a majority of the code
>>> Christian Kunz: Wrote most of the communication code and security integration with Hadoop
>>>
>>> === Alignment ===
>>>
>>> Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.
>>>
>>> == Known Risks ==
>>>
>>> === Orphaned products ===
>>>
>>> There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.
>>>
>>> === Inexperience with Open Source ===
>>>
>>> While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.
>>>
>>> === Homogenous Developers ===
>>>
>>> At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.
>>>
>>> === Reliance on Salaried Developers ===
>>>
>>> Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.
>>>
>>> === Relationships with Other Apache Products ===
>>>
>>> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.
>>>
>>> Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>>>
>>> === An Excessive Fascination with the Apache Brand ===
>>>
>>> The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.
>>>
>>> === Documentation ===
>>>
>>> Currently we have little documentation, but several examples.  We are working on improving this situation.
>>>
>>> === Initial Source ===
>>>
>>> The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>>>
>>> === Source and Intellectual Property Submission Plan ===
>>>
>>> We intend the entire code base to be licensed under the Apache License, Version 2.0.
>>>
>>> === External Dependencies ===
>>>
>>> The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
>>> * JSON – Public Domain
>>>
>>> === Cryptography ===
>>>
>>> Giraph depends on secure Hadoop that can optionally use Kerberos.
>>>
>>> == Required Resources ==
>>>
>>> === Mailing lists ===
>>>
>>> * giraph-private (with moderated subscriptions)
>>> * giraph-dev
>>> * giraph-commits
>>> * giraph-users
>>>
>>> === Subversion Directory ===
>>>
>>> https://svn.apache.org/repos/asf/incubator/giraph
>>>
>>> === Issue Tracking ===
>>>
>>> JIRA Giraph (GIRAPH)
>>>
>>> === Other Resources ===
>>>
>>> Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.
>>>
>>> === Initial Committers ===
>>>
>>> Avery Ching, aching at yahoo-inc dot com
>>> Christian Kunz, christian at jybe-inc dot com
>>> Owen O’Malley, owen at hortonworks dot com
>>>
>>> === Affiliations ===
>>>
>>> Avery Ching, Yahoo!
>>> Christian Kunz, Jybe
>>>
>>> == Sponsors ==
>>>
>>> === Champion ===
>>>
>>> Owen O’ Malley
>>>
>>> === Nominated Mentors ===
>>>
>>> Owen O’Malley
>>>
>>> === Sponsoring Entity ===
>>>
>>> Apache Incubator PMC
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Avery Ching <ac...@yahoo-inc.com>.

Henry,

While we haven't begun too much work on a generic library, the intent is to provide generic vertex input/output formats, aggregators, combiners, and graph computations that make it very easy for a user to get started right away.  None of these need to be explicitly integrated with Hadoop or Hadoop objects.  That being said, we provide users the ability to use existing Hadoop Writable implementations, such as IntWritable, FloatWritable, etc. to make their lives easier rather than reimplementing those basic types.  Similarly, the methods of VertexInputFormat/VertexOutputFormat need not be implemented using an underlying Hadoop InputFormat/OutputFormat, but they are similar to make it easy to do so if desired.

Hope that answers your question,

Avery

On Jul 21, 2011, at 4:09 PM, Henry Saputra wrote:

> Will the library generic graph algorithm be tightly coupled with the
> Hadoop integration piece?
> 
> - Henry
> 
> On Fri, Jul 15, 2011 at 11:14 AM, Avery Ching <ac...@yahoo-inc.com> wrote:
>> Hi,
>> 
>> I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.
>> 
>> Here is a link to the proposal in our GitHub wiki:
>> 
>> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>> 
>> The proposal is also inlined below:
>> 
>> Thanks!
>> 
>> Avery
>> 
>> 
>> 
>> = Giraph : Large-scale graph processing on Hadoop =
>> 
>> == Abstract ==
>> 
>> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
>> 
>> == Proposal ==
>> 
>> Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.
>> 
>> == Background ==
>> 
>> Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.
>> 
>> == Rationale ==
>> 
>> Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
>> 
>> == Initial Goals ==
>> 
>> At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.
>> 
>> == Current Status ==
>> 
>> === Meritocracy ===
>> 
>> Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.
>> 
>> === Community ===
>> 
>> Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.
>> 
>> === Core Developers ===
>> 
>> Avery Ching: Wrote a majority of the code
>> Christian Kunz: Wrote most of the communication code and security integration with Hadoop
>> 
>> === Alignment ===
>> 
>> Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.
>> 
>> == Known Risks ==
>> 
>> === Orphaned products ===
>> 
>> There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.
>> 
>> === Inexperience with Open Source ===
>> 
>> While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.
>> 
>> === Homogenous Developers ===
>> 
>> At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.
>> 
>> === Reliance on Salaried Developers ===
>> 
>> Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.
>> 
>> === Relationships with Other Apache Products ===
>> 
>> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.
>> 
>> Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>> 
>> === An Excessive Fascination with the Apache Brand ===
>> 
>> The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.
>> 
>> === Documentation ===
>> 
>> Currently we have little documentation, but several examples.  We are working on improving this situation.
>> 
>> === Initial Source ===
>> 
>> The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>> 
>> === Source and Intellectual Property Submission Plan ===
>> 
>> We intend the entire code base to be licensed under the Apache License, Version 2.0.
>> 
>> === External Dependencies ===
>> 
>> The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
>> * JSON – Public Domain
>> 
>> === Cryptography ===
>> 
>> Giraph depends on secure Hadoop that can optionally use Kerberos.
>> 
>> == Required Resources ==
>> 
>> === Mailing lists ===
>> 
>> * giraph-private (with moderated subscriptions)
>> * giraph-dev
>> * giraph-commits
>> * giraph-users
>> 
>> === Subversion Directory ===
>> 
>> https://svn.apache.org/repos/asf/incubator/giraph
>> 
>> === Issue Tracking ===
>> 
>> JIRA Giraph (GIRAPH)
>> 
>> === Other Resources ===
>> 
>> Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.
>> 
>> === Initial Committers ===
>> 
>> Avery Ching, aching at yahoo-inc dot com
>> Christian Kunz, christian at jybe-inc dot com
>> Owen O’Malley, owen at hortonworks dot com
>> 
>> === Affiliations ===
>> 
>> Avery Ching, Yahoo!
>> Christian Kunz, Jybe
>> 
>> == Sponsors ==
>> 
>> === Champion ===
>> 
>> Owen O’ Malley
>> 
>> === Nominated Mentors ===
>> 
>> Owen O’Malley
>> 
>> === Sponsoring Entity ===
>> 
>> Apache Incubator PMC
>> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: [PROPOSAL] Proposing Giraph for the Apache Incubator

Posted by Henry Saputra <he...@gmail.com>.

Will the library generic graph algorithm be tightly coupled with the
Hadoop integration piece?

- Henry

On Fri, Jul 15, 2011 at 11:14 AM, Avery Ching <ac...@yahoo-inc.com> wrote:
> Hi,
>
> I would like to propose Giraph as an Apache Incubator project.  Giraph is a large-scale graph processing infrastructure (inspired by Pregel) that runs entirely on Hadoop.  Giraph applications and MapReduce jobs coexist on shared Hadoop instances and Giraph applications can be part of Oozie workflows as a normal MapReduce job.
>
> Here is a link to the proposal in our GitHub wiki:
>
> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal
>
> The proposal is also inlined below:
>
> Thanks!
>
> Avery
>
>
>
> = Giraph : Large-scale graph processing on Hadoop =
>
> == Abstract ==
>
> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel (BSP)-based graph processing framework.
>
> == Proposal ==
>
> Graph processing platforms to run large-scale algorithms (such as page rank, shared connections, personalization-based popularity, etc.) have become quite popular.  Some recent examples include Pregel and HaLoop.  For general-purpose big data computation, the MapReduce computation model is widely adopted and the most deployed MapReduce infrastructure is Apache Hadoop.  We have implemented a graph-processing framework that is launched as a typical Hadoop MapReduce job to leverage existing Hadoop infrastructure, such as Amazon’s EC2.  Giraph builds upon the graph-oriented nature of Pregel but additionally adds fault-tolerance to the coordinator process with the use of ZooKeeper as its centralized coordination service.  Additionally, Giraph will include a library of generic graph algorithms.
>
> == Background ==
>
> Giraph was initially began development as a side project at Yahoo! at the end of 2010.  It was made functional in a month and then started adding various features.  Development has been focused on internal customers needs until this point.
>
> == Rationale ==
>
> Web and online social graphs have been rapidly growing in size and scale during the past decade.  In 2008, Google estimated that the number of web pages reached over a trillion.  Online social networking and email sites, including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have hundreds of millions of users and are expected to grow much more in the future.  Processing these graphs plays a big role in relevant and personalized information for users, such as results from a search engine or news in an online social networking site.
>
> == Initial Goals ==
>
> At this point, most of the functionality has been implemented and we are looking to get more adoption and contributions from users outside Yahoo!.   We want to ensure that performance scales and that the code is robust and fault tolerant.
>
> == Current Status ==
>
> === Meritocracy ===
>
> Giraph was initially developed by Avery Ching and Christian Kunz beginning in December 2010 at Yahoo!.  There are other developers using Giraph at Yahoo! that are making suggestions and adding code.  We are reaching out to other folks at social networking companies for additional usage and development.
>
> === Community ===
>
> Several groups who are interested in either joining our project or using our code have contacted us.  We certainly believe that there is a lot of interest and are actively looking to improve and expand the community.
>
> === Core Developers ===
>
> Avery Ching: Wrote a majority of the code
> Christian Kunz: Wrote most of the communication code and security integration with Hadoop
>
> === Alignment ===
>
> Giraph uses several Apache projects as its underlying infrastructure (Hadoop and ZooKeeper).   It also builds on Apache Maven.
>
> == Known Risks ==
>
> === Orphaned products ===
>
> There are many social networking companies that would be interested in using this graph-processing framework and we have already received interest from some of them.  Yahoo! is already using this code in production and will certainly continue to use it in the future as well.
>
> === Inexperience with Open Source ===
>
> While the initial developers have limited experience on contributing to open-source projects, Yahoo! as a company has a strong commitment to open-source and we have several advisors that we can ask for help.
>
> === Homogenous Developers ===
>
> At this time, the project is relatively young and the developers work at only two companies (Yahoo! and Jybe).  However, given the interest we have seen in the project, we expect the diversity to improve in the near future.
>
> === Reliance on Salaried Developers ===
>
> Currently Giraph is being developed by a combination of salaried and volunteer time.  We expect that other corporations will take an interest in this project and likely contribute with salaried developers.  Some individuals will likely spend volunteer time on it as well.  It is still early in their project and we are hoping for a lot of growth.
>
> === Relationships with Other Apache Products ===
>
> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, etc.  It is built using Apache Maven.
>
> Giraph has some overlapping functionality with Apache Hama.  However, there are some significant differences.  Giraph focuses on graph-based bulk synchronous parallel (BSP) computing, while Apache Hama is more for general purposed BSP computing.  Giraph runs on the Hadoop infrastructure, while Apache Hama uses its own computing framework.
>
> === An Excessive Fascination with the Apache Brand ===
>
> The Apache brand is likely to help us find contributors, however, our interests in Apache are primarily because the other projects that we depend on are also Apache projects and it makes sense that all this software be available from the same place.
>
> === Documentation ===
>
> Currently we have little documentation, but several examples.  We are working on improving this situation.
>
> === Initial Source ===
>
> The initial source of the code is from Yahoo! and began development in December 2010.  It is already available on GitHub at https://github.com/aching/Giraph.
>
> === Source and Intellectual Property Submission Plan ===
>
> We intend the entire code base to be licensed under the Apache License, Version 2.0.
>
> === External Dependencies ===
>
> The required dependencies are all Apache compatible licenses.  The following components with non-Apache licenses are enumerated:
> * JSON – Public Domain
>
> === Cryptography ===
>
> Giraph depends on secure Hadoop that can optionally use Kerberos.
>
> == Required Resources ==
>
> === Mailing lists ===
>
> * giraph-private (with moderated subscriptions)
> * giraph-dev
> * giraph-commits
> * giraph-users
>
> === Subversion Directory ===
>
> https://svn.apache.org/repos/asf/incubator/giraph
>
> === Issue Tracking ===
>
> JIRA Giraph (GIRAPH)
>
> === Other Resources ===
>
> Giraph has integration tests that can be run with the LocalJobRunner.  These same tests also designed to be run on a small (even single node) Hadoop cluster.  While not required at this time, it would be nice if such a resource were available.
>
> === Initial Committers ===
>
> Avery Ching, aching at yahoo-inc dot com
> Christian Kunz, christian at jybe-inc dot com
> Owen O’Malley, owen at hortonworks dot com
>
> === Affiliations ===
>
> Avery Ching, Yahoo!
> Christian Kunz, Jybe
>
> == Sponsors ==
>
> === Champion ===
>
> Owen O’ Malley
>
> === Nominated Mentors ===
>
> Owen O’Malley
>
> === Sponsoring Entity ===
>
> Apache Incubator PMC
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org