You are viewing a plain text version of this content. The canonical link for it is here.

Posted to general@incubator.apache.org by "Li,De(BDG)" <li...@baidu.com> on 2018/06/08 04:45:32 UTC

Looking for Champion

Hi all,

I am Reed, as a developer worked with the team for Palo (a MPP-based interactive SQL data warehousing).
https://github.com/baidu/palo/wiki/Palo-Overview

We propose to contribute Palo as an Apache Incubator project, and
we are still looking for possible Champion if anyone would like to volunteer. Thanks a lot.

Best Regards,
Reed

===================
The draft of the proposal as below:

#Apache Palo

##Abstract

Palo is a MPP-based interactive SQL data warehousing for reporting and analysis.

##Proposal

We propose to contribute the Palo codebase and associated artifacts (e.g. documentation, web-site content etc.) to the Apache Software Foundation with the intent of forming a productive, meritocratic and open community around Palo’s continued development, according to the ‘Apache Way’.

Baidu owns several trademarks regarding Palo, and proposes to transfer ownership of those trademarks in full to the ASF.

###Overview of Palo

Palo’s implementation consists of two daemons: Frontend (FE) and Backend (BE).

**Frontend daemon** consists of query coordinator and catalog manager. Query coordinator is responsible for receiving users’ sql queries, compiling queries and managing queries execution. Catalog manager is responsible for managing metadata such as databases, tables, partitions, replicas and etc. Several frontend daemons could be deployed to guarantee fault-tolerance, and load balancing.

**Backend daemon** stores the data and executes the query fragments. Many backend daemons could also be deployed to provide scalability and fault-tolerance.

A typical Palo cluster generally composes of several frontend daemons and dozens to hundreds of backend daemons.

Users can use MySQL client tools to connect any frontend daemon to submit SQL query. Frontend receives the query and compiles it into query plans executable by the Backend. Then Frontend sends the query plan fragments to Backend. Backend will build a query execution DAG. Data is fetched and pipelined into the DAG. The final result response is sent to client via Frontend. The distribution of query fragment execution takes minimizing data movement and maximizing scan locality as the main goal.

##Background

At Baidu, Prior to Palo, different tools were deployed to solve diverse requirements in many ways. And when a use case requires the simultaneous availability of capabilities that cannot all be provided by a single tool, users were forced to build hybrid architectures that stitch multiple tools together, but we believe that they shouldn’t need to accept such inherent complexity. A storage system built to provide great performance across a broad range of workloads provides a more elegant solution to the problems that hybrid architectures aim to solve. Palo is the solution.

Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo provides bulk-batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability.

##Rationale

Palo mainly integrates the technology of Google Mesa and Apache Impala.

Mesa is a highly scalable analytic data storage system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy complex and challenging set of users’ and systems’ requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes.

Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. At present, by virtue of its superior performance and rich functionality， Impala has been comparable to many commercial MPP database query engine. Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a perfect distributed storage engine. So in the end we chose the combination of these two technologies.

Learning from Mesa’s data model, we developed a distributed storage engine. Unlike Mesa, this storage engine does not rely on any distributed file system. Then we deeply integrate this storage engine with Impala query engine. Query compiling, query execution coordination and catalog management of storage engine are integrated to be frontend daemon; query execution and data storage are integrated to be backend daemon. With this integration, we implemented a single, full-featured, high performance state the art of MPP database, as well as maintaining the simplicity.

##Current Status

Palo has been an open source project on GitHub (https://github.com/baidu/palo).

###Meritocracy

Palo has been deployed in production at Baidu and is applying more than 200 lines of business. It has demonstrated great performance benefits and has proved to be a better way for reporting and analysis based big data. Still We look forward to growing a rich user and developer community.

###Community

Palo seeks to develop developer and user communities during incubation.

###Core Developers

* Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<ma...@baidu.com>)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com>)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>)
* Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<ma...@baidu.com>)
* Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<ma...@gmail.com>)

###Alignment

Palo is related to several other Apache projects:

* Palo can also read data stored in Apache Hadoop clusters powered by the HDFS filesystem.
* Palo is closely integrated with Impala, which is also being proposed to the Incubator.
* Palo uses Apache Thrift as its RPC and serialization framework of choice.

##Known Risks

###Orphaned Products

The core developers of Palo team plan to work full time on this project. There is very little risk of Palo getting orphaned since at least one large company (Baidu) is extensively using it in their production. For example, currently there are more than 200 use cases using Palo in production. Furthermore, since Palo was open sourced at the beginning of October 2017, it has received more than 660 stars and been forked nearly 170 times. We plan to extend and diversify this community further through Apache.

###Inexperience with Open Source

The core developers are all active users and followers of open source. They are already committers and contributors to the Palo Github project. All have been involved with the source code that has been released under an open source license, and several of them also have experience developing code in an open source environment. Though the core set of Developers do not have Apache Open Source experience, there are plans to onboard individuals with Apache open source experience on to the project.

###Homogenous Developers

The most of core developers are from Baidu, but after Palo was open sourced, Palo received a lot of bug fixes and enhancements from other developers not working at Baidu.

###Reliance on Salaried Developers

Baidu invested in Palo as the OLAP solution and some of its key engineers are working full time on the project. In addition, since there is a growing Big Data need for scalable OLAP solutions, we look forward to other Apache developers and researchers to contribute to the project. Also key to addressing the risk associated with relying on Salaried developers from a single entity is to increase the diversity of the contributors and actively lobby for Domain experts in the BI space to contribute. Apache Palo intends to do this.

###An Excessive Fascination with the Apache Brand

Palo is proposing to enter incubation at Apache in order to help efforts to diversify the committer-base, not so much to capitalize on the Apache brand. The Palo project is in production use already inside Baidu, but is not expected to be an Baidu product for external customers. As such, the Palo project is not seeking to use the Apache brand as a marketing tool.

##Documentation

Information about Palo can be found at https://github.com/baidu/palo. The following links provide more information about Palo in open source:

* Palo wiki site: https://github.com/baidu/palo/wiki
* Codebase at Github: https://github.com/baidu/palo
* Issue Tracking: https://github.com/baidu/palo/issues
* Overview: https://github.com/baidu/palo/wiki/Palo-Overview
* FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ

##Initial Source

Palo has been under development since 2017 by a team of engineers at Baidu Inc. It is currently hosted on Github.com under an Apache license at https://github.com/baidu/palo.

##External Dependencies

Palo has the following external dependencies.

* Google gflags (BSD)
* Google glog (BSD)
* Apache Thrift (Apache Software License v2.0)
* Apache Commons (Apache Software License v2.0)
* Boost (Boost Software License)
* OpenLdap (OpenLDAP Software License)
* rapidjson (Tencent)
* Google RE2 (BSD-style)
* lz4 (BSD)
* snappy (BSD)
* cyrus-sasl (CMU License)
* Twitter Bootstrap (Apache Software License v2.0)
* d3 (BSD)
* LLVM (BSD-like)

Build and test dependencies:

* ant (Apache Software License v2.0)
* Apache Maven (Apache Software License v2.0)
* cmake (BSD)
* clang (BSD)
* Google gtest (Apache Software License v2.0)

##Required Resources

###Mailing List

There are currently no mailing lists. The usual mailing lists are expected to be set up when entering incubation:

private@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
commits@palo.incubator.apache.org<ma...@palo.incubator.apache.org>

###Subversion Directory

Upon entering incubation: https://github.com/baidu/palo.
After incubation, we want to move the existing repo from https://github.com/baidu/palo to Apache infrastructure.

###Issue Tracking

Palo currently uses GitHub to track issues. Would like to continue to do so while we discuss migration possibilities with the ASF Infra committee.

###Other Resources

The existing code already has unit tests so we will make use of existing Apache continuous testing infrastructure. The resulting load should not be very large.

##Initial Committers

##Affiliations

The initial committers are employees of Baidu Inc.. The nominated mentors are employees of TODO.

##Sponsors

###Champion

TODO

###Nominated Mentors

* sijie guo, guosijie@gmail.com<ma...@gmail.com>
* Luke Han, lukehan@apache.org<ma...@apache.org>
* Zheng Shao, zshao@apache.org<ma...@apache.org>

###Sponsoring Entity

We are requesting the Incubator to sponsor this project.

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Dave,

We have a new name Doris, so we will rename Palo to Doris.
I have updated proposal as following:

#Apache Doris

##Abstract

Doris is a MPP-based interactive SQL data warehousing for reporting and analysis.

##Proposal

We propose to contribute the Doris codebase and associated artifacts (e.g. documentation, web-site content etc.) to the Apache Software Foundation, and aim to build an open community around Doris’s continued development in the ‘Apache Way’.

###Overview of Doris

Doris’s implementation consists of two daemons: Frontend (FE) and Backend (BE).

**Frontend daemon** consists of query coordinator and catalog manager. Query coordinator is responsible for receiving users’ sql queries, compiling queries and managing queries execution. Catalog manager is responsible for managing metadata such as databases, tables, partitions, replicas and etc. Several frontend daemons could be deployed to guarantee fault-tolerance, and load balancing.

**Backend daemon** stores the data and executes the query fragments. Many backend daemons could also be deployed to provide scalability and fault-tolerance.

A typical Doris cluster generally composes of several frontend daemons and dozens to hundreds of backend daemons.

Users can use MySQL client tools to connect any frontend daemon to submit SQL query. Frontend receives the query and compiles it into query plans executable by the Backend. Then Frontend sends the query plan fragments to Backend. Backend will build a query execution DAG. Data is fetched and pipelined into the DAG. The final result response is sent to client via Frontend. The distribution of query fragment execution takes minimizing data movement and maximizing scan locality as the main goal.

##Background

At Baidu, Prior to Doris, different tools were deployed to solve diverse requirements in many ways. And when a use case requires the simultaneous availability of capabilities that cannot all be provided by a single tool, users were forced to build hybrid architectures that stitch multiple tools together, but we believe that they shouldn’t need to accept such inherent complexity. A storage system built to provide great performance across a broad range of workloads provides a more elegant solution to the problems that hybrid architectures aim to solve. Doris is the solution.

Doris is designed to be a simple and single tightly coupled system, not depending on other systems. Doris provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Doris provides bulk-batch data loading, but also provides near real-time mini-batch data loading. Doris also provides high availability, reliability, fault tolerance, and scalability.

##Rationale

Doris mainly integrates the technology of Google Mesa and Apache Impala.

Mesa is a highly scalable analytic data storage system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy complex and challenging set of users’ and systems’ requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes.

Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. At present, by virtue of its superior performance and rich functionality， Impala has been comparable to many commercial MPP database query engine. Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a perfect distributed storage engine. So in the end we chose the combination of these two technologies.

Learning from Mesa’s data model, we developed a distributed storage engine. Unlike Mesa, this storage engine does not rely on any distributed file system. Then we deeply integrate this storage engine with Impala query engine. Query compiling, query execution coordination and catalog management of storage engine are integrated to be frontend daemon; query execution and data storage are integrated to be backend daemon. With this integration, we implemented a single, full-featured, high performance state the art of MPP database, as well as maintaining the simplicity.

##Current Status

Doris has been an open source project on GitHub (https://github.com/baidu/palo).

###Meritocracy

Doris has been deployed in production at Baidu and is applying more than 200 lines of business. It has demonstrated great performance benefits and has proved to be a better way for reporting and analysis based big data. Still We look forward to growing a rich user and developer community.

###Community

Doris seeks to develop developer and user communities during incubation.

###Core Developers

* Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li（https://github.com/lide-reed, mailtolide@sina.com）
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com)
* Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com)
* Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com)

###Alignment

Doris is related to several other Apache projects:

* Doris can also read data stored in Apache Hadoop clusters powered by the HDFS filesystem.
* Doris is closely integrated with Impala, which has graduated from Apache Incubator.
* Doris uses Apache Thrift as its RPC and serialization framework of choice.

##Known Risks

###Orphaned Products

The core developers of Doris team plan to work full time on this project. There is very little risk of Doris getting orphaned since at least one large company (Baidu) is extensively using it in their production. For example, currently there are more than 200 use cases using Doris in production. Furthermore, since Doris was open sourced at the beginning of October 2017, it has received more than 660 stars and been forked nearly 170 times. We plan to extend and diversify this community further through Apache.

###Inexperience with Open Source

The core developers are all active users and followers of open source. They are already committers and contributors to the Doris Github project. All have been involved with the source code that has been released under an open source license, and several of them also have experience developing code in an open source environment. Though the core set of Developers do not have Apache Open Source experience, there are plans to onboard individuals with Apache open source experience on to the project.

###Homogenous Developers

The most of core developers are from Baidu, but after Doris was open sourced, Doris received a lot of bug fixes and enhancements from other developers not working at Baidu.

###Reliance on Salaried Developers

Baidu invested in Doris as the OLAP solution and some of its key engineers are working full time on the project. In addition, since there is a growing Big Data need for scalable OLAP solutions, we look forward to other Apache developers and researchers to contribute to the project. Also key to addressing the risk associated with relying on Salaried developers from a single entity is to increase the diversity of the contributors and actively lobby for Domain experts in the BI space to contribute. Apache Doris intends to do this.

###An Excessive Fascination with the Apache Brand

Doris is proposing to enter incubation at Apache in order to help efforts to diversify the committer-base, not so much to capitalize on the Apache brand. The Doris project is in production use already inside Baidu, but is not expected to be an Baidu product for external customers. As such, the Doris project is not seeking to use the Apache brand as a marketing tool.

##Documentation

Information about Doris can be found at https://github.com/baidu/palo. The following links provide more information about Doris in open source:

* Doris wiki site: https://github.com/baidu/palo/wiki
* Codebase at Github: https://github.com/baidu/palo
* Issue Tracking: https://github.com/baidu/palo/issues
* Overview: https://github.com/baidu/Doris/wiki/palo-Overview
* FAQ: https://github.com/baidu/palo/wiki/palo-FAQ

##Initial Source

Doris has been under development since 2017 by a team of engineers at Baidu Inc. It is currently hosted on Github.com under an Apache license at https://github.com/baidu/palo.

##External Dependencies

Doris has the following external dependencies.

* Google gflags (BSD)
* Google glog (BSD)
* Apache Thrift (Apache Software License v2.0)
* Apache Commons (Apache Software License v2.0)
* Boost (Boost Software License)
* rapidjson (Tencent)
* Google RE2 (BSD-style)
* lz4 (BSD)
* snappy (BSD)
* Twitter Bootstrap (Apache Software License v2.0)
* d3 (BSD)
* LLVM (BSD-like)

Build and test dependencies:

* ant (Apache Software License v2.0)
* Apache Maven (Apache Software License v2.0)
* cmake (BSD)
* clang (BSD)
* Google gtest (Apache Software License v2.0)

##Required Resources

###Mailing List

There are currently no mailing lists. The usual mailing lists are expected to be set up when entering incubation:

private@doris.incubator.apache.org
dev@doris.incubator.apache.org
commits@doris.incubator.apache.org

###Subversion Directory

Upon entering incubation: https://github.com/baidu/palo.
After incubation, we want to move the existing repo from https://github.com/baidu/palo to Apache infrastructure.

###Issue Tracking

Doris currently uses GitHub to track issues. Would like to continue to do so while we discuss migration possibilities with the ASF Infra committee.

###Other Resources

The existing code already has unit tests so we will make use of existing Apache continuous testing infrastructure. The resulting load should not be very large.

##Initial Committers

* Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li（https://github.com/lide-reed, mailtolide@sina.com）
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com)
* Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com)
* Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com)
* Sijie Guo (guosijie@gmail.com)
* Zheng Shao (zshao@apache.org)

##Affiliations

The initial committers are employees of Baidu Inc..

##Sponsors

###Champion

Dave Fisher, dave2wave@comcast.net

###Nominated Mentors

* Luke Han, lukehan@apache.org
* Dave Fisher, dave2wave@comcast.net
* Willem Jiang, willem.jiang@gmail.com

###Sponsoring Entity

We are requesting the Incubator to sponsor this project.

On 2018/6/19 下午6:54， "Li,De(BDG)" <li...@baidu.com>> wrote:

Hi Dave,

Thank you for your summary.

For #1, I got it, we will find a new name ASAP within several days.

For #2, I see, about license, we are rechecking all licenses of components in Palo, and we have fixed most of those we found as I wrote in last email. Next, we will continue to do this work carefully.

For #3, We have reflected upon Jim's suggestion, and we will try to find out or define a cleanly interface between Palo and Impala and to determine which parts should keep in Palo and which parts should as patches for Impala. More detail and roadmap are still to be work out.

For #4, I accepted your suggestion and I will update proposal.

Once I have a new name, I will send you with updated proposal.

Best Regards,
Reed

发件人: Dave Fisher <da...@comcast.net>>
答复: <ge...@incubator.apache.org>>
日期: 2018年6月19日 星期二 上午2:08
至: <ge...@incubator.apache.org>>
主题: Re: Looking for Champion

Hi Li,De -

Since I agreed to champion this project I think that we need a summary about what the Incubator PMC cares about in order to accept a podling. What the prospective project needs to address. We also need to be clear what should happen during Incubation and at what time. I think that many of the questions that came up in this thread had to do with assessing how much effort it will take to Incubate Palo (or whatever the name will be)

(1) The name Palo. Since there seems to be an issue with that name we should have a new name. It is not unknown for a podling to change its name, but that does generate extra work for Infrastructure to change the name after podling start up. It would be our preference for Palo to find a new name prior to VOTING on the proposal. Please do this elsewhere and come back to me with the new name so that I can help with the updated proposal.

(2) Licensing of the software. Several bits came up as questionable. Regardless of cleanup that has already occurred we have identified that we will need to be very careful. It will be important to discuss and carefully handle the Software Grant Agreement to make sure that the source listed is correct. I think that the SGA must come early during incubation.

(3) Relationship with Impala. Palo has apparently forked portions of Impala. This means that some are concerned that there is a missed synergy with the Apache Impala project. Is there a clean interface that can be built between the projects? It would help if the Palo developers would explore this with Impala at dev@impala.apache.org<ma...@impala.apache.org>.

That said, part of the Incubation process is to learn the Apache Way. IMHO it is ok for the relationship between Impala PMC and a pooling PPMC to be a work in process.

(4) Currently, Willem, Luke Han and Dave Fisher are qualified to officially mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial Committers in order to help from within the PPMC.

On Jun 14, 2018, at 11:03 AM, Jim Apple <jb...@cloudera.com.INVALID>> wrote:

I don't want to be a stickler, but I don't think "For issues mentioned by
Jim, Todd and Tim, I have replied on last Saturday."

To my email about Palo being an ASF project as a storage system without a
query engine, you replied only, "We will seriously consider this proposal."

I see no response to Tim's concern that "The code isn't owned by any
individual, I contributed it to Apache and it's
free for anyone to do what they want to do with it, but pulling in
improvements from other projects without any attempt to attribute it or
contribute improvements back seems contrary to the Apache way.”

Jim - do you need answers to these concerns prior to agreeing to accept this project into the Incubator?

Regards,
Dave


On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com>> wrote:

Hi all,

About Palo, we have fixed following issues.

1. Related Impala
For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.

2、Lisence issue
For issues mentioned by Todd and Ted.
1) be/aes/* come from mysql-5.6, GPL v2.1 license
Fixed: removed aes related codes.
https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
180b30bf
b7
https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
77698f1c
ed

2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
Fixed: removed mysql_dtoa related codes.
https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
75b1f841
a1

3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
Fixed: restored to original lisence, we are searching another http server
to replace it.
https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
f59f04a8
31

4) be/rpc/*
Fixed: We have replaced it with brpc, and we will remove Hypertable after
few weeks for waiting users' upgrade to brpc.
https://github.com/baidu/palo/tree/master/be/src/rpc

3、Dependency licenses
For issue mentioned by Dave, It looks like that Palo have not depend on
OpenLdap and cyrus-sasl directly,
but some thirdpary libraries need them to compile, libcurl and gperftools
for instance.
For rapidjson, we are looking for alternative one.

4、About the name of Palo
For issue mentioned by Julian.
We are figuring out a better one.

Best Regards,
Reed



在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com>> 写入:

Hi Julian,

Thank you.

It looks like that we have to find another one.
If anyone has a good name, please feel free to let me know.

Best Regards,
Reed

在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org>> 写入:

Note that there is an existing database product called Palo - an open
source OLAP engine by German company Jedox[1]. There there is a high
likelihood that Palo would have to change its name during incubation, if
accepted.

Julian

[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
<https://en.wikipedia.org/wiki/Palo_(OLAP_database)>



On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com>> wrote:

Cool Dave, it’s great to have you to be the campaign.


________________________________
From: Tan,Zhongyi <ta...@baidu.com> <ma...@baidu.com>>
Sent: Saturday, June 9, 2018 8:16:28 AM
To: general@incubator.apache.org<ma...@incubator.apache.org> <ma...@incubator.apache.org>
Subject: Re: Looking for Champion

thanks，willem

we are very appreciate.

在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com>> 写道：

Hi,

I'm willing to be the Mentor.
Please count me in.



Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>>
wrote:

Hi -

I’m willing to Champion and Mentor. I have a couple of comments
inline.
I’ll look at dependency licenses later today. It’s early for me.


On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com>> wrote:

Hi all,

I am Reed, as a developer worked with the team for Palo (a MPP-based
interactive SQL data warehousing).
https://github.com/baidu/palo/wiki/Palo-Overview

We propose to contribute Palo as an Apache Incubator project, and
we are still looking for possible Champion if anyone would like to
volunteer. Thanks a lot.

Best Regards,
Reed

===================
The draft of the proposal as below:

#Apache Palo

##Abstract

Palo is a MPP-based interactive SQL data warehousing for reporting
and
analysis.

##Proposal

We propose to contribute the Palo codebase and associated artifacts
(e.g. documentation, web-site content etc.) to the Apache Software
Foundation with the intent of forming a productive, meritocratic and
open
community around Palo’s continued development, according to the
‘Apache
Way’.

Baidu owns several trademarks regarding Palo, and proposes to
transfer
ownership of those trademarks in full to the ASF.

###Overview of Palo

Palo’s implementation consists of two daemons: Frontend (FE) and
Backend
(BE).

**Frontend daemon** consists of query coordinator and catalog
manager.
Query coordinator is responsible for receiving users’ sql queries,
compiling queries and managing queries execution. Catalog manager is
responsible for managing metadata such as databases, tables,
partitions,
replicas and etc. Several frontend daemons could be deployed to
guarantee
fault-tolerance, and load balancing.

**Backend daemon** stores the data and executes the query fragments.
Many backend daemons could also be deployed to provide scalability
and
fault-tolerance.

A typical Palo cluster generally composes of several frontend
daemons
and dozens to hundreds of backend daemons.

Users can use MySQL client tools to connect any frontend daemon to
submit SQL query. Frontend receives the query and compiles it into
query
plans executable by the Backend. Then Frontend sends the query plan
fragments to Backend. Backend will build a query execution DAG. Data
is
fetched and pipelined into the DAG. The final result response is sent
to
client via Frontend. The distribution of query fragment execution
takes
minimizing data movement and maximizing scan locality as the main
goal.

##Background

At Baidu, Prior to Palo, different tools were deployed to solve
diverse
requirements in many ways. And when a use case requires the
simultaneous
availability of capabilities that cannot all be provided by a single
tool,
users were forced to build hybrid architectures that stitch multiple
tools
together, but we believe that they shouldn’t need to accept such
inherent
complexity. A storage system built to provide great performance
across a
broad range of workloads provides a more elegant solution to the
problems
that hybrid architectures aim to solve. Palo is the solution.

Palo is designed to be a simple and single tightly coupled system,
not
depending on other systems. Palo provides high concurrent low latency
point
query performance, but also provides high throughput queries of
ad-hoc
analysis. Palo provides bulk-batch data loading, but also provides
near
real-time mini-batch data loading. Palo also provides high
availability,
reliability, fault tolerance, and scalability.

##Rationale

Palo mainly integrates the technology of Google Mesa and Apache
Impala.

Mesa is a highly scalable analytic data storage system that stores
critical measurement data related to Google's Internet advertising
business. Mesa is designed to satisfy complex and challenging set of
users’
and systems’ requirements, including near real-time data ingestion
and
query ability, as well as high availability, reliability, fault
tolerance,
and scalability for large data and query volumes.

Impala is a modern, open-source MPP SQL engine architected from the
ground up for the Hadoop data processing environment. At present, by
virtue
of its superior performance and rich functionality， Impala has been
comparable to many commercial MPP database query engine. Mesa can
satisfy
the needs of many of our storage requirements, however Mesa itself
does not
provide a SQL query engine; Impala is a very good MPP SQL query
engine, but
the lack of a perfect distributed storage engine. So in the end we
chose
the combination of these two technologies.

Learning from Mesa’s data model, we developed a distributed storage
engine. Unlike Mesa, this storage engine does not rely on any
distributed
file system. Then we deeply integrate this storage engine with Impala
query
engine. Query compiling, query execution coordination and catalog
management of storage engine are integrated to be frontend daemon;
query
execution and data storage are integrated to be backend daemon. With
this
integration, we implemented a single, full-featured, high performance
state
the art of MPP database, as well as maintaining the simplicity.

##Current Status

Palo has been an open source project on GitHub (
https://github.com/baidu/palo).

###Meritocracy

Palo has been deployed in production at Baidu and is applying more
than
200 lines of business. It has demonstrated great performance benefits
and
has proved to be a better way for reporting and analysis based big
data.
Still We look forward to growing a rich user and developer community.

###Community

Palo seeks to develop developer and user communities during
incubation.

###Core Developers

* Ruyue Ma (https://github.com/maruyue,
maruyue@baidu.com<ma...@baidu.com><mailto:maruy
ue@baidu.com<ma...@baidu.com>>)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com><mailto:
bu
aa.zhaoc@gmail.com<ma...@gmail.com>>)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com）><mailto:
ma
iltolide@sina.com%EF%BC%89<mailto:iltolide@sina.com%EF%BC%89>>
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>
<ma...@baidu.com>)
* Chaoyong Li (https://github.com/cyongli,
lichaoyong@baidu.com<ma...@baidu.com><mailto:
lichaoyong@baidu.com<ma...@baidu.com>>)
* Bin Lin (https://github.com/lingbin,
lingbinlb@gmail.com<ma...@gmail.com><mailto:lin
gbinlb@gmail.com<ma...@gmail.com>>)

###Alignment

Palo is related to several other Apache projects:

* Palo can also read data stored in Apache Hadoop clusters powered
by
the HDFS filesystem.
* Palo is closely integrated with Impala, which is also being
proposed
to the Incubator.

Apache Impala has completed Incubation. Jim Apple is VP, Impala.

* Palo uses Apache Thrift as its RPC and serialization framework of
choice.

##Known Risks

###Orphaned Products

The core developers of Palo team plan to work full time on this
project.
There is very little risk of Palo getting orphaned since at least one
large
company (Baidu) is extensively using it in their production. For
example,
currently there are more than 200 use cases using Palo in production.
Furthermore, since Palo was open sourced at the beginning of October
2017,
it has received more than 660 stars and been forked nearly 170 times.
We
plan to extend and diversify this community further through Apache.

###Inexperience with Open Source

The core developers are all active users and followers of open
source.
They are already committers and contributors to the Palo Github
project.
All have been involved with the source code that has been released
under an
open source license, and several of them also have experience
developing
code in an open source environment. Though the core set of Developers
do
not have Apache Open Source experience, there are plans to onboard
individuals with Apache open source experience on to the project.

###Homogenous Developers

The most of core developers are from Baidu, but after Palo was open
sourced, Palo received a lot of bug fixes and enhancements from other
developers not working at Baidu.

###Reliance on Salaried Developers

Baidu invested in Palo as the OLAP solution and some of its key
engineers are working full time on the project. In addition, since
there is
a growing Big Data need for scalable OLAP solutions, we look forward
to
other Apache developers and researchers to contribute to the project.
Also
key to addressing the risk associated with relying on Salaried
developers
from a single entity is to increase the diversity of the contributors
and
actively lobby for Domain experts in the BI space to contribute.
Apache
Palo intends to do this.

###An Excessive Fascination with the Apache Brand

Palo is proposing to enter incubation at Apache in order to help
efforts
to diversify the committer-base, not so much to capitalize on the
Apache
brand. The Palo project is in production use already inside Baidu,
but is
not expected to be an Baidu product for external customers. As such,
the
Palo project is not seeking to use the Apache brand as a marketing
tool.

##Documentation

Information about Palo can be found at
https://github.com/baidu/palo.
The following links provide more information about Palo in open
source:

* Palo wiki site: https://github.com/baidu/palo/wiki
* Codebase at Github: https://github.com/baidu/palo
* Issue Tracking: https://github.com/baidu/palo/issues
* Overview: https://github.com/baidu/palo/wiki/Palo-Overview
* FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ

##Initial Source

Palo has been under development since 2017 by a team of engineers at
Baidu Inc. It is currently hosted on Github.com under an Apache
license at
https://github.com/baidu/palo.

##External Dependencies

Palo has the following external dependencies.

* Google gflags (BSD)
* Google glog (BSD)
* Apache Thrift (Apache Software License v2.0)
* Apache Commons (Apache Software License v2.0)
* Boost (Boost Software License)
* OpenLdap (OpenLDAP Software License)
* rapidjson (Tencent)
* Google RE2 (BSD-style)
* lz4 (BSD)
* snappy (BSD)
* cyrus-sasl (CMU License)
* Twitter Bootstrap (Apache Software License v2.0)
* d3 (BSD)
* LLVM (BSD-like)

Build and test dependencies:

* ant (Apache Software License v2.0)
* Apache Maven (Apache Software License v2.0)
* cmake (BSD)
* clang (BSD)
* Google gtest (Apache Software License v2.0)

##Required Resources

###Mailing List

There are currently no mailing lists. The usual mailing lists are
expected to be set up when entering incubation:

private@palo.incubator.apache.org<ma...@palo.incubator.apache.org><mailto:private@palo.
incubator.apache.org>
dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
commits@palo.incubator.apache.org<ma...@palo.incubator.apache.org><mailto:commits@palo.
incubator.apache.org>

###Subversion Directory

Upon entering incubation: https://github.com/baidu/palo.
After incubation, we want to move the existing repo from
https://github.com/baidu/palo to Apache infrastructure.

###Issue Tracking

Palo currently uses GitHub to track issues. Would like to continue
to do
so while we discuss migration possibilities with the ASF Infra
committee.

###Other Resources

The existing code already has unit tests so we will make use of
existing
Apache continuous testing infrastructure. The resulting load should
not be
very large.

##Initial Committers

* Ruyue Ma (https://github.com/maruyue,
maruyue@baidu.com<ma...@baidu.com><mailto:maruy
ue@baidu.com<ma...@baidu.com>>)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com><mailto:
bu
aa.zhaoc@gmail.com<ma...@gmail.com>>)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com）><mailto:
ma
iltolide@sina.com%EF%BC%89<mailto:iltolide@sina.com%EF%BC%89>>
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>
<ma...@baidu.com>)
* Chaoyong Li (https://github.com/cyongli,
lichaoyong@baidu.com<ma...@baidu.com><mailto:
lichaoyong@baidu.com<ma...@baidu.com>>)
* Bin Lin (https://github.com/lingbin,
lingbinlb@gmail.com<ma...@gmail.com><mailto:lin
gbinlb@gmail.com<ma...@gmail.com>>)

##Affiliations

The initial committers are employees of Baidu Inc.. The nominated
mentors are employees of TODO.

##Sponsors

###Champion

TODO

###Nominated Mentors

* sijie guo, guosijie@gmail.com<ma...@gmail.com>
* Luke Han, lukehan@apache.org<ma...@apache.org>
* Zheng Shao, zshao@apache.org<ma...@apache.org>

Mentors must be members of the IPMC and almost always Members of the
ASF.

At this moment only Luke Han is qualified.

Regards,
Dave


###Sponsoring Entity

We are requesting the Incubator to sponsor this project.


?B婯
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKCB??[
溳
X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
圹[X[???K[XZ[??賉橽榌
Z?[???[樰X榏?軏榎?X?K涇櫭B


?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKCB�
?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
ܙ�B��܈?Y??]?[ۘ[?
?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Dave,

Thank you for your summary.

For #1, I got it, we will find a new name ASAP within several days.

For #2, I see, about license, we are rechecking all licenses of components in Palo, and we have fixed most of those we found as I wrote in last email. Next, we will continue to do this work carefully.

For #3, We have reflected upon Jim's suggestion, and we will try to find out or define a cleanly interface between Palo and Impala and to determine which parts should keep in Palo and which parts should as patches for Impala. More detail and roadmap are still to be work out.

For #4, I accepted your suggestion and I will update proposal.

Once I have a new name, I will send you with updated proposal.

Best Regards,
Reed

发件人: Dave Fisher <da...@comcast.net>>
答复: <ge...@incubator.apache.org>>
日期: 2018年6月19日 星期二 上午2:08
至: <ge...@incubator.apache.org>>
主题: Re: Looking for Champion

Hi Li,De -

Since I agreed to champion this project I think that we need a summary about what the Incubator PMC cares about in order to accept a podling. What the prospective project needs to address. We also need to be clear what should happen during Incubation and at what time. I think that many of the questions that came up in this thread had to do with assessing how much effort it will take to Incubate Palo (or whatever the name will be)

(1) The name Palo. Since there seems to be an issue with that name we should have a new name. It is not unknown for a podling to change its name, but that does generate extra work for Infrastructure to change the name after podling start up. It would be our preference for Palo to find a new name prior to VOTING on the proposal. Please do this elsewhere and come back to me with the new name so that I can help with the updated proposal.

(2) Licensing of the software. Several bits came up as questionable. Regardless of cleanup that has already occurred we have identified that we will need to be very careful. It will be important to discuss and carefully handle the Software Grant Agreement to make sure that the source listed is correct. I think that the SGA must come early during incubation.

(3) Relationship with Impala. Palo has apparently forked portions of Impala. This means that some are concerned that there is a missed synergy with the Apache Impala project. Is there a clean interface that can be built between the projects? It would help if the Palo developers would explore this with Impala at dev@impala.apache.org<ma...@impala.apache.org>.

That said, part of the Incubation process is to learn the Apache Way. IMHO it is ok for the relationship between Impala PMC and a pooling PPMC to be a work in process.

(4) Currently, Willem, Luke Han and Dave Fisher are qualified to officially mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial Committers in order to help from within the PPMC.

On Jun 14, 2018, at 11:03 AM, Jim Apple <jb...@cloudera.com.INVALID>> wrote:

I don't want to be a stickler, but I don't think "For issues mentioned by
Jim, Todd and Tim, I have replied on last Saturday."

To my email about Palo being an ASF project as a storage system without a
query engine, you replied only, "We will seriously consider this proposal."

I see no response to Tim's concern that "The code isn't owned by any
individual, I contributed it to Apache and it's
free for anyone to do what they want to do with it, but pulling in
improvements from other projects without any attempt to attribute it or
contribute improvements back seems contrary to the Apache way.”

Jim - do you need answers to these concerns prior to agreeing to accept this project into the Incubator?

Regards,
Dave


On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com>> wrote:

Hi all,

About Palo, we have fixed following issues.

1. Related Impala
For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.

2、Lisence issue
For issues mentioned by Todd and Ted.
1) be/aes/* come from mysql-5.6, GPL v2.1 license
Fixed: removed aes related codes.
https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
180b30bf
b7
https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
77698f1c
ed

2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
Fixed: removed mysql_dtoa related codes.
https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
75b1f841
a1

3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
Fixed: restored to original lisence, we are searching another http server
to replace it.
https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
f59f04a8
31

4) be/rpc/*
Fixed: We have replaced it with brpc, and we will remove Hypertable after
few weeks for waiting users' upgrade to brpc.
https://github.com/baidu/palo/tree/master/be/src/rpc

3、Dependency licenses
For issue mentioned by Dave, It looks like that Palo have not depend on
OpenLdap and cyrus-sasl directly,
but some thirdpary libraries need them to compile, libcurl and gperftools
for instance.
For rapidjson, we are looking for alternative one.

4、About the name of Palo
For issue mentioned by Julian.
We are figuring out a better one.

Best Regards,
Reed



在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com>> 写入:

Hi Julian,

Thank you.

It looks like that we have to find another one.
If anyone has a good name, please feel free to let me know.

Best Regards,
Reed

在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org>> 写入:

Note that there is an existing database product called Palo - an open
source OLAP engine by German company Jedox[1]. There there is a high
likelihood that Palo would have to change its name during incubation, if
accepted.

Julian

[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
<https://en.wikipedia.org/wiki/Palo_(OLAP_database)>



On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com>> wrote:

Cool Dave, it’s great to have you to be the campaign.


________________________________
From: Tan,Zhongyi <ta...@baidu.com> <ma...@baidu.com>>
Sent: Saturday, June 9, 2018 8:16:28 AM
To: general@incubator.apache.org<ma...@incubator.apache.org> <ma...@incubator.apache.org>
Subject: Re: Looking for Champion

thanks，willem

we are very appreciate.

在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com>> 写道：

Hi,

I'm willing to be the Mentor.
Please count me in.



Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>>
wrote:

Hi -

I’m willing to Champion and Mentor. I have a couple of comments
inline.
I’ll look at dependency licenses later today. It’s early for me.


On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com>> wrote:

Hi all,

I am Reed, as a developer worked with the team for Palo (a MPP-based
interactive SQL data warehousing).
https://github.com/baidu/palo/wiki/Palo-Overview

We propose to contribute Palo as an Apache Incubator project, and
we are still looking for possible Champion if anyone would like to
volunteer. Thanks a lot.

Best Regards,
Reed

===================
The draft of the proposal as below:

#Apache Palo

##Abstract

Palo is a MPP-based interactive SQL data warehousing for reporting
and
analysis.

##Proposal

We propose to contribute the Palo codebase and associated artifacts
(e.g. documentation, web-site content etc.) to the Apache Software
Foundation with the intent of forming a productive, meritocratic and
open
community around Palo’s continued development, according to the
‘Apache
Way’.

Baidu owns several trademarks regarding Palo, and proposes to
transfer
ownership of those trademarks in full to the ASF.

###Overview of Palo

Palo’s implementation consists of two daemons: Frontend (FE) and
Backend
(BE).

**Frontend daemon** consists of query coordinator and catalog
manager.
Query coordinator is responsible for receiving users’ sql queries,
compiling queries and managing queries execution. Catalog manager is
responsible for managing metadata such as databases, tables,
partitions,
replicas and etc. Several frontend daemons could be deployed to
guarantee
fault-tolerance, and load balancing.

**Backend daemon** stores the data and executes the query fragments.
Many backend daemons could also be deployed to provide scalability
and
fault-tolerance.

A typical Palo cluster generally composes of several frontend
daemons
and dozens to hundreds of backend daemons.

Users can use MySQL client tools to connect any frontend daemon to
submit SQL query. Frontend receives the query and compiles it into
query
plans executable by the Backend. Then Frontend sends the query plan
fragments to Backend. Backend will build a query execution DAG. Data
is
fetched and pipelined into the DAG. The final result response is sent
to
client via Frontend. The distribution of query fragment execution
takes
minimizing data movement and maximizing scan locality as the main
goal.

##Background

At Baidu, Prior to Palo, different tools were deployed to solve
diverse
requirements in many ways. And when a use case requires the
simultaneous
availability of capabilities that cannot all be provided by a single
tool,
users were forced to build hybrid architectures that stitch multiple
tools
together, but we believe that they shouldn’t need to accept such
inherent
complexity. A storage system built to provide great performance
across a
broad range of workloads provides a more elegant solution to the
problems
that hybrid architectures aim to solve. Palo is the solution.

Palo is designed to be a simple and single tightly coupled system,
not
depending on other systems. Palo provides high concurrent low latency
point
query performance, but also provides high throughput queries of
ad-hoc
analysis. Palo provides bulk-batch data loading, but also provides
near
real-time mini-batch data loading. Palo also provides high
availability,
reliability, fault tolerance, and scalability.

##Rationale

Palo mainly integrates the technology of Google Mesa and Apache
Impala.

Mesa is a highly scalable analytic data storage system that stores
critical measurement data related to Google's Internet advertising
business. Mesa is designed to satisfy complex and challenging set of
users’
and systems’ requirements, including near real-time data ingestion
and
query ability, as well as high availability, reliability, fault
tolerance,
and scalability for large data and query volumes.

Impala is a modern, open-source MPP SQL engine architected from the
ground up for the Hadoop data processing environment. At present, by
virtue
of its superior performance and rich functionality， Impala has been
comparable to many commercial MPP database query engine. Mesa can
satisfy
the needs of many of our storage requirements, however Mesa itself
does not
provide a SQL query engine; Impala is a very good MPP SQL query
engine, but
the lack of a perfect distributed storage engine. So in the end we
chose
the combination of these two technologies.

Learning from Mesa’s data model, we developed a distributed storage
engine. Unlike Mesa, this storage engine does not rely on any
distributed
file system. Then we deeply integrate this storage engine with Impala
query
engine. Query compiling, query execution coordination and catalog
management of storage engine are integrated to be frontend daemon;
query
execution and data storage are integrated to be backend daemon. With
this
integration, we implemented a single, full-featured, high performance
state
the art of MPP database, as well as maintaining the simplicity.

##Current Status

Palo has been an open source project on GitHub (
https://github.com/baidu/palo).

###Meritocracy

Palo has been deployed in production at Baidu and is applying more
than
200 lines of business. It has demonstrated great performance benefits
and
has proved to be a better way for reporting and analysis based big
data.
Still We look forward to growing a rich user and developer community.

###Community

Palo seeks to develop developer and user communities during
incubation.

###Core Developers

* Ruyue Ma (https://github.com/maruyue,
maruyue@baidu.com<ma...@baidu.com><mailto:maruy
ue@baidu.com<ma...@baidu.com>>)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com><mailto:
bu
aa.zhaoc@gmail.com<ma...@gmail.com>>)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com）><mailto:
ma
iltolide@sina.com%EF%BC%89<mailto:iltolide@sina.com%EF%BC%89>>
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>
<ma...@baidu.com>)
* Chaoyong Li (https://github.com/cyongli,
lichaoyong@baidu.com<ma...@baidu.com><mailto:
lichaoyong@baidu.com<ma...@baidu.com>>)
* Bin Lin (https://github.com/lingbin,
lingbinlb@gmail.com<ma...@gmail.com><mailto:lin
gbinlb@gmail.com<ma...@gmail.com>>)

###Alignment

Palo is related to several other Apache projects:

* Palo can also read data stored in Apache Hadoop clusters powered
by
the HDFS filesystem.
* Palo is closely integrated with Impala, which is also being
proposed
to the Incubator.

Apache Impala has completed Incubation. Jim Apple is VP, Impala.

* Palo uses Apache Thrift as its RPC and serialization framework of
choice.

##Known Risks

###Orphaned Products

The core developers of Palo team plan to work full time on this
project.
There is very little risk of Palo getting orphaned since at least one
large
company (Baidu) is extensively using it in their production. For
example,
currently there are more than 200 use cases using Palo in production.
Furthermore, since Palo was open sourced at the beginning of October
2017,
it has received more than 660 stars and been forked nearly 170 times.
We
plan to extend and diversify this community further through Apache.

###Inexperience with Open Source

The core developers are all active users and followers of open
source.
They are already committers and contributors to the Palo Github
project.
All have been involved with the source code that has been released
under an
open source license, and several of them also have experience
developing
code in an open source environment. Though the core set of Developers
do
not have Apache Open Source experience, there are plans to onboard
individuals with Apache open source experience on to the project.

###Homogenous Developers

The most of core developers are from Baidu, but after Palo was open
sourced, Palo received a lot of bug fixes and enhancements from other
developers not working at Baidu.

###Reliance on Salaried Developers

Baidu invested in Palo as the OLAP solution and some of its key
engineers are working full time on the project. In addition, since
there is
a growing Big Data need for scalable OLAP solutions, we look forward
to
other Apache developers and researchers to contribute to the project.
Also
key to addressing the risk associated with relying on Salaried
developers
from a single entity is to increase the diversity of the contributors
and
actively lobby for Domain experts in the BI space to contribute.
Apache
Palo intends to do this.

###An Excessive Fascination with the Apache Brand

Palo is proposing to enter incubation at Apache in order to help
efforts
to diversify the committer-base, not so much to capitalize on the
Apache
brand. The Palo project is in production use already inside Baidu,
but is
not expected to be an Baidu product for external customers. As such,
the
Palo project is not seeking to use the Apache brand as a marketing
tool.

##Documentation

Information about Palo can be found at
https://github.com/baidu/palo.
The following links provide more information about Palo in open
source:

* Palo wiki site: https://github.com/baidu/palo/wiki
* Codebase at Github: https://github.com/baidu/palo
* Issue Tracking: https://github.com/baidu/palo/issues
* Overview: https://github.com/baidu/palo/wiki/Palo-Overview
* FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ

##Initial Source

Palo has been under development since 2017 by a team of engineers at
Baidu Inc. It is currently hosted on Github.com under an Apache
license at
https://github.com/baidu/palo.

##External Dependencies

Palo has the following external dependencies.

* Google gflags (BSD)
* Google glog (BSD)
* Apache Thrift (Apache Software License v2.0)
* Apache Commons (Apache Software License v2.0)
* Boost (Boost Software License)
* OpenLdap (OpenLDAP Software License)
* rapidjson (Tencent)
* Google RE2 (BSD-style)
* lz4 (BSD)
* snappy (BSD)
* cyrus-sasl (CMU License)
* Twitter Bootstrap (Apache Software License v2.0)
* d3 (BSD)
* LLVM (BSD-like)

Build and test dependencies:

* ant (Apache Software License v2.0)
* Apache Maven (Apache Software License v2.0)
* cmake (BSD)
* clang (BSD)
* Google gtest (Apache Software License v2.0)

##Required Resources

###Mailing List

There are currently no mailing lists. The usual mailing lists are
expected to be set up when entering incubation:

private@palo.incubator.apache.org<ma...@palo.incubator.apache.org><mailto:private@palo.
incubator.apache.org>
dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
commits@palo.incubator.apache.org<ma...@palo.incubator.apache.org><mailto:commits@palo.
incubator.apache.org>

###Subversion Directory

Upon entering incubation: https://github.com/baidu/palo.
After incubation, we want to move the existing repo from
https://github.com/baidu/palo to Apache infrastructure.

###Issue Tracking

Palo currently uses GitHub to track issues. Would like to continue
to do
so while we discuss migration possibilities with the ASF Infra
committee.

###Other Resources

The existing code already has unit tests so we will make use of
existing
Apache continuous testing infrastructure. The resulting load should
not be
very large.

##Initial Committers

* Ruyue Ma (https://github.com/maruyue,
maruyue@baidu.com<ma...@baidu.com><mailto:maruy
ue@baidu.com<ma...@baidu.com>>)
* Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com><mailto:
bu
aa.zhaoc@gmail.com<ma...@gmail.com>>)
* Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
* De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com）><mailto:
ma
iltolide@sina.com%EF%BC%89<mailto:iltolide@sina.com%EF%BC%89>>
* Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>
<ma...@baidu.com>)
* Chaoyong Li (https://github.com/cyongli,
lichaoyong@baidu.com<ma...@baidu.com><mailto:
lichaoyong@baidu.com<ma...@baidu.com>>)
* Bin Lin (https://github.com/lingbin,
lingbinlb@gmail.com<ma...@gmail.com><mailto:lin
gbinlb@gmail.com<ma...@gmail.com>>)

##Affiliations

The initial committers are employees of Baidu Inc.. The nominated
mentors are employees of TODO.

##Sponsors

###Champion

TODO

###Nominated Mentors

* sijie guo, guosijie@gmail.com<ma...@gmail.com>
* Luke Han, lukehan@apache.org<ma...@apache.org>
* Zheng Shao, zshao@apache.org<ma...@apache.org>

Mentors must be members of the IPMC and almost always Members of the
ASF.

At this moment only Luke Han is qualified.

Regards,
Dave


###Sponsoring Entity

We are requesting the Incubator to sponsor this project.


?B婯
KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKCB??[
溳
X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
圹[X[???K[XZ[??賉橽榌
Z?[???[樰X榏?軏榎?X?K涇櫭B


?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
KKKKKKKKCB�
?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
ܙ�B��܈?Y??]?[ۘ[?
?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B

Re: Looking for Champion

Posted by Ted Dunning <te...@gmail.com>.

The licensing scrub does need to be done. But not necessarily before
incubation starts.



On Mon, Jun 18, 2018 at 10:13 PM Ryan Blue <rb...@netflix.com.invalid>
wrote:

> Okay, then let me rephrase: I would like to see a plan in the Palo proposal
> for a licensing scrub to be done before graduation.
>
> I'm still a little skeptical about this practice because the Incubator PMC
> validates the release on behalf of the foundation, but I think that's a
> separate issue to consider that doesn't need to distract on this Palo
> thread. Thanks for the explanation, Greg!
>
> On Mon, Jun 18, 2018 at 1:00 PM, Greg Stein <gs...@gmail.com> wrote:
>
> > Heya Ryan,
> >
> > On Mon, Jun 18, 2018 at 2:39 PM Ryan Blue <rb...@netflix.com> wrote:
> >
> >> > we have allowed (and IMO should continue) podlings to have licensing
> >> issues during their incubator releases
> >>
> >> Thanks for pointing this out, Greg. I wasn't aware of this and have
> >> always had releases fail when we discover licensing issues. I think
> there's
> >> a significant risk of license problems, so I had assumed we would
> require a
> >> thorough scrub before the first release.
> >>
> >> What's the argument for finishing this work before graduation rather
> than
> >> first release? Isn't the release a product for which the ASF is legally
> >> responsible? Given that we fail releases for known license issues,
> >> shouldn't we also be more careful when we know there are likely to be
> >> issues?
> >>
> >
> > This is why incubator releases have a disclaimer. It gives them time to
> > work through dependency and licensing issues, even while they're testing
> > their release process with our KEYS and distribution framework. So the
> > "argument" is simply to allow the podling to multitask, rather than gate
> > one of their activities.
> >
> > When you really want to lift the cover, there isn't a problem if a
> podling
> > releases (say) a hard LGPL dependency. That's just a policy choice of the
> > Foundation, to avoid such dependencies. We don't like it, and maybe some
> > messed up licensing downstream, possibly, for somebody to tease apart.
> But
> > historically, the Incubator has let these issues slide for a while, yet
> > gate on graduation.
> >
> > I also feel that podling releases are in a grey area, that don't truly
> > have the full backing of the ASF (thus the disclaimer, and them not
> being a
> > TLP; although technically the Apache Incubator is the stand-in PMC behind
> > the release).
> >
> > Cheers,
> > -g
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Looking for Champion

Posted by Luke Han <lu...@gmail.com>.

Incubating progress is something how to fix such issue, there's no need
such plan at this moment

Ryan Blue <rb...@netflix.com.invalid>于2018年6月19日周二 上午4:13写道：

> Okay, then let me rephrase: I would like to see a plan in the Palo proposal
> for a licensing scrub to be done before graduation.
>
> I'm still a little skeptical about this practice because the Incubator PMC
> validates the release on behalf of the foundation, but I think that's a
> separate issue to consider that doesn't need to distract on this Palo
> thread. Thanks for the explanation, Greg!
>
> On Mon, Jun 18, 2018 at 1:00 PM, Greg Stein <gs...@gmail.com> wrote:
>
> > Heya Ryan,
> >
> > On Mon, Jun 18, 2018 at 2:39 PM Ryan Blue <rb...@netflix.com> wrote:
> >
> >> > we have allowed (and IMO should continue) podlings to have licensing
> >> issues during their incubator releases
> >>
> >> Thanks for pointing this out, Greg. I wasn't aware of this and have
> >> always had releases fail when we discover licensing issues. I think
> there's
> >> a significant risk of license problems, so I had assumed we would
> require a
> >> thorough scrub before the first release.
> >>
> >> What's the argument for finishing this work before graduation rather
> than
> >> first release? Isn't the release a product for which the ASF is legally
> >> responsible? Given that we fail releases for known license issues,
> >> shouldn't we also be more careful when we know there are likely to be
> >> issues?
> >>
> >
> > This is why incubator releases have a disclaimer. It gives them time to
> > work through dependency and licensing issues, even while they're testing
> > their release process with our KEYS and distribution framework. So the
> > "argument" is simply to allow the podling to multitask, rather than gate
> > one of their activities.
> >
> > When you really want to lift the cover, there isn't a problem if a
> podling
> > releases (say) a hard LGPL dependency. That's just a policy choice of the
> > Foundation, to avoid such dependencies. We don't like it, and maybe some
> > messed up licensing downstream, possibly, for somebody to tease apart.
> But
> > historically, the Incubator has let these issues slide for a while, yet
> > gate on graduation.
> >
> > I also feel that podling releases are in a grey area, that don't truly
> > have the full backing of the ASF (thus the disclaimer, and them not
> being a
> > TLP; although technically the Apache Incubator is the stand-in PMC behind
> > the release).
> >
> > Cheers,
> > -g
> >
> >
>
>
> --
> Ryan Blue
> Software Engineer
> Netflix
>

Re: Looking for Champion

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

Okay, then let me rephrase: I would like to see a plan in the Palo proposal
for a licensing scrub to be done before graduation.

I'm still a little skeptical about this practice because the Incubator PMC
validates the release on behalf of the foundation, but I think that's a
separate issue to consider that doesn't need to distract on this Palo
thread. Thanks for the explanation, Greg!

On Mon, Jun 18, 2018 at 1:00 PM, Greg Stein <gs...@gmail.com> wrote:

> Heya Ryan,
>
> On Mon, Jun 18, 2018 at 2:39 PM Ryan Blue <rb...@netflix.com> wrote:
>
>> > we have allowed (and IMO should continue) podlings to have licensing
>> issues during their incubator releases
>>
>> Thanks for pointing this out, Greg. I wasn't aware of this and have
>> always had releases fail when we discover licensing issues. I think there's
>> a significant risk of license problems, so I had assumed we would require a
>> thorough scrub before the first release.
>>
>> What's the argument for finishing this work before graduation rather than
>> first release? Isn't the release a product for which the ASF is legally
>> responsible? Given that we fail releases for known license issues,
>> shouldn't we also be more careful when we know there are likely to be
>> issues?
>>
>
> This is why incubator releases have a disclaimer. It gives them time to
> work through dependency and licensing issues, even while they're testing
> their release process with our KEYS and distribution framework. So the
> "argument" is simply to allow the podling to multitask, rather than gate
> one of their activities.
>
> When you really want to lift the cover, there isn't a problem if a podling
> releases (say) a hard LGPL dependency. That's just a policy choice of the
> Foundation, to avoid such dependencies. We don't like it, and maybe some
> messed up licensing downstream, possibly, for somebody to tease apart. But
> historically, the Incubator has let these issues slide for a while, yet
> gate on graduation.
>
> I also feel that podling releases are in a grey area, that don't truly
> have the full backing of the ASF (thus the disclaimer, and them not being a
> TLP; although technically the Apache Incubator is the stand-in PMC behind
> the release).
>
> Cheers,
> -g
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Looking for Champion

Posted by Greg Stein <gs...@gmail.com>.

Heya Ryan,

On Mon, Jun 18, 2018 at 2:39 PM Ryan Blue <rb...@netflix.com> wrote:

> > we have allowed (and IMO should continue) podlings to have licensing
> issues during their incubator releases
>
> Thanks for pointing this out, Greg. I wasn't aware of this and have always
> had releases fail when we discover licensing issues. I think there's a
> significant risk of license problems, so I had assumed we would require a
> thorough scrub before the first release.
>
> What's the argument for finishing this work before graduation rather than
> first release? Isn't the release a product for which the ASF is legally
> responsible? Given that we fail releases for known license issues,
> shouldn't we also be more careful when we know there are likely to be
> issues?
>

This is why incubator releases have a disclaimer. It gives them time to
work through dependency and licensing issues, even while they're testing
their release process with our KEYS and distribution framework. So the
"argument" is simply to allow the podling to multitask, rather than gate
one of their activities.

When you really want to lift the cover, there isn't a problem if a podling
releases (say) a hard LGPL dependency. That's just a policy choice of the
Foundation, to avoid such dependencies. We don't like it, and maybe some
messed up licensing downstream, possibly, for somebody to tease apart. But
historically, the Incubator has let these issues slide for a while, yet
gate on graduation.

I also feel that podling releases are in a grey area, that don't truly have
the full backing of the ASF (thus the disclaimer, and them not being a TLP;
although technically the Apache Incubator is the stand-in PMC behind the
release).

Cheers,
-g

Re: Looking for Champion

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

> we have allowed (and IMO should continue) podlings to have licensing
issues during their incubator releases

Thanks for pointing this out, Greg. I wasn't aware of this and have always
had releases fail when we discover licensing issues. I think there's a
significant risk of license problems, so I had assumed we would require a
thorough scrub before the first release.

What's the argument for finishing this work before graduation rather than
first release? Isn't the release a product for which the ASF is legally
responsible? Given that we fail releases for known license issues,
shouldn't we also be more careful when we know there are likely to be
issues?

rb

On Mon, Jun 18, 2018 at 12:24 PM, Greg Stein <gs...@gmail.com> wrote:

> On Mon, Jun 18, 2018 at 2:08 PM Ryan Blue <rb...@netflix.com.invalid>
> wrote:
> >...
>
>> 2. The license problems so far show that the project has not paid adequate
>> attention to licensing up to now, which is a big risk. I'd like to see
>> what
>> kind of licensing scrub is proposed before the potential podling's first
>> release. I don't think that catching all the obvious ones is sufficient.
>
>
> To be clear: we have allowed (and IMO should continue) podlings to have
> licensing issues during their incubator releases. For example, while
> they're still dealing with Hibernate dependencies. It is understandable and
> (IMO) acceptable that such releases will have problems. That is just part
> of the process. As long as it gets cleaned up before graduation.
>
> Not diminishing the need for a good scrub, but I would not want to see
> releases gated on that. (it's unclear from your text; maybe just a *plan*
> rather than completion of the scrub?)
>
> Cheers,
> -g
>
>

-- 
Ryan Blue
Software Engineer
Netflix

Re: Looking for Champion

Posted by Greg Stein <gs...@gmail.com>.

On Mon, Jun 18, 2018 at 2:08 PM Ryan Blue <rb...@netflix.com.invalid> wrote:
>...

> 2. The license problems so far show that the project has not paid adequate
> attention to licensing up to now, which is a big risk. I'd like to see what
> kind of licensing scrub is proposed before the potential podling's first
> release. I don't think that catching all the obvious ones is sufficient.

To be clear: we have allowed (and IMO should continue) podlings to have
licensing issues during their incubator releases. For example, while
they're still dealing with Hibernate dependencies. It is understandable and
(IMO) acceptable that such releases will have problems. That is just part
of the process. As long as it gets cleaned up before graduation.

Not diminishing the need for a good scrub, but I would not want to see
releases gated on that. (it's unclear from your text; maybe just a *plan*
rather than completion of the scrub?)

Cheers,
-g

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Ryan,

Thank you for your response.

For #1, I admit we have not build open and inclusive community so far yet,
but we have realized the importance.
So one of our aims to proposal to enter incubrator is that to build a
well-cooperated community and worked with others in Apache way.


For #2, It is not only "catching all the obvious one”, actually, we
scanned and rechecked all code carefully and worked out a todo list.
Now we are working on it and fixed one by one.

Best Regards,
Reed

在 2018/6/19 上午3:07， "Ryan Blue" <rb...@netflix.com.INVALID> 写入:

>I agree with Jim, at least mostly.
>
>I don't mind code and toil duplication between projects in itself, but I
>think that the current state of the project shows that there are two large
>risks to the potential Palo podling (for lack of a better name):
>
>1. The choice not to work with the Impala community initially shows a risk
>of not working with others when it may be more difficult to do so than
>not.
>I think this should be directly addressed in the proposal: how do we know
>that this will be an open and inclusive community willing to work with
>others with slightly different goals?
>2. The license problems so far show that the project has not paid adequate
>attention to licensing up to now, which is a big risk. I'd like to see
>what
>kind of licensing scrub is proposed before the potential podling's first
>release. I don't think that catching all the obvious ones is sufficient.
>
>rb
>
>On Mon, Jun 18, 2018 at 11:51 AM, Jim Apple <jb...@cloudera.com.invalid>
>wrote:
>
>> I'm not a binding vote on incubator entry, but I think it would be
>> great to have roadmaps as soon as feasible on addressing Tim's concern
>> (which is deeply related to #2, "Licensing") and on addressing the
>> code and toil duplication.
>>
>> On Mon, Jun 18, 2018 at 11:08 AM, Dave Fisher <da...@comcast.net>
>> wrote:
>> > Hi Li,De -
>> >
>> > Since I agreed to champion this project I think that we need a summary
>> about
>> > what the Incubator PMC cares about in order to accept a podling. What
>>the
>> > prospective project needs to address. We also need to be clear what
>> should
>> > happen during Incubation and at what time. I think that many of the
>> > questions that came up in this thread had to do with assessing how
>>much
>> > effort it will take to Incubate Palo (or whatever the name will be)
>> >
>> > (1) The name Palo. Since there seems to be an issue with that name we
>> should
>> > have a new name. It is not unknown for a podling to change its name,
>>but
>> > that does generate extra work for Infrastructure to change the name
>>after
>> > podling start up. It would be our preference for Palo to find a new
>>name
>> > prior to VOTING on the proposal. Please do this elsewhere and come
>>back
>> to
>> > me with the new name so that I can help with the updated proposal.
>> >
>> > (2) Licensing of the software. Several bits came up as questionable.
>> > Regardless of cleanup that has already occurred we have identified
>>that
>> we
>> > will need to be very careful. It will be important to discuss and
>> carefully
>> > handle the Software Grant Agreement to make sure that the source
>>listed
>> is
>> > correct. I think that the SGA must come early during incubation.
>> >
>> > (3) Relationship with Impala. Palo has apparently forked portions of
>> Impala.
>> > This means that some are concerned that there is a missed synergy with
>> the
>> > Apache Impala project. Is there a clean interface that can be built
>> between
>> > the projects? It would help if the Palo developers would explore this
>> with
>> > Impala at dev@impala.apache.org.
>> >
>> > That said, part of the Incubation process is to learn the Apache Way.
>> IMHO
>> > it is ok for the relationship between Impala PMC and a pooling PPMC to
>> be a
>> > work in process.
>> >
>> > (4) Currently, Willem, Luke Han and Dave Fisher are qualified to
>> officially
>> > mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial
>> > Committers in order to help from within the PPMC.
>> >
>> > On Jun 14, 2018, at 11:03 AM, Jim Apple <jb...@cloudera.com.INVALID>
>> > wrote:
>> >
>> > I don't want to be a stickler, but I don't think "For issues
>>mentioned by
>> > Jim, Todd and Tim, I have replied on last Saturday."
>> >
>> > To my email about Palo being an ASF project as a storage system
>>without a
>> > query engine, you replied only, "We will seriously consider this
>> proposal."
>> >
>> > I see no response to Tim's concern that "The code isn't owned by any
>> > individual, I contributed it to Apache and it's
>> > free for anyone to do what they want to do with it, but pulling in
>> > improvements from other projects without any attempt to attribute it
>>or
>> > contribute improvements back seems contrary to the Apache way.”
>> >
>> >
>> > Jim - do you need answers to these concerns prior to agreeing to
>>accept
>> this
>> > project into the Incubator?
>> >
>> > Regards,
>> > Dave
>> >
>> >
>> > On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com> wrote:
>> >
>> > Hi all,
>> >
>> > About Palo, we have fixed following issues.
>> >
>> > 1. Related Impala
>> > For issues mentioned by Jim, Todd and Tim, I have replied on last
>> Saturday.
>> >
>> > 2、Lisence issue
>> > For issues mentioned by Todd and Ted.
>> > 1) be/aes/* come from mysql-5.6, GPL v2.1 license
>> > Fixed: removed aes related codes.
>> > https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
>> > 180b30bf
>> > b7
>> > https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
>> > 77698f1c
>> > ed
>> >
>> > 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
>> > Fixed: removed mysql_dtoa related codes.
>> > https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
>> > 75b1f841
>> > a1
>> >
>> > 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
>> > Fixed: restored to original lisence, we are searching another http
>>server
>> > to replace it.
>> > https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
>> > f59f04a8
>> > 31
>> >
>> > 4) be/rpc/*
>> > Fixed: We have replaced it with brpc, and we will remove Hypertable
>>after
>> > few weeks for waiting users' upgrade to brpc.
>> > https://github.com/baidu/palo/tree/master/be/src/rpc
>> >
>> > 3、Dependency licenses
>> > For issue mentioned by Dave, It looks like that Palo have not depend
>>on
>> > OpenLdap and cyrus-sasl directly,
>> > but some thirdpary libraries need them to compile, libcurl and
>>gperftools
>> > for instance.
>> > For rapidjson, we are looking for alternative one.
>> >
>> > 4、About the name of Palo
>> > For issue mentioned by Julian.
>> > We are figuring out a better one.
>> >
>> > Best Regards,
>> > Reed
>> >
>> >
>> >
>> > 在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com> 写入:
>> >
>> > Hi Julian,
>> >
>> > Thank you.
>> >
>> > It looks like that we have to find another one.
>> > If anyone has a good name, please feel free to let me know.
>> >
>> > Best Regards,
>> > Reed
>> >
>> > 在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:
>> >
>> > Note that there is an existing database product called Palo - an open
>> > source OLAP engine by German company Jedox[1]. There there is a high
>> > likelihood that Palo would have to change its name during incubation,
>>if
>> > accepted.
>> >
>> > Julian
>> >
>> > [1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
>> > <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>> >
>> >
>> >
>> > On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
>> >
>> > Cool Dave, it’s great to have you to be the campaign.
>> >
>> >
>> > ________________________________
>> > From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
>> > Sent: Saturday, June 9, 2018 8:16:28 AM
>> > To: general@incubator.apache.org <ma...@incubator.apache.org>
>> > Subject: Re: Looking for Champion
>> >
>> > thanks，willem
>> >
>> > we are very appreciate.
>> >
>> > 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
>> >
>> > Hi,
>> >
>> > I'm willing to be the Mentor.
>> > Please count me in.
>> >
>> >
>> >
>> > Willem Jiang
>> >
>> > Twitter: willemjiang
>> > Weibo: 姜宁willem
>> >
>> > On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>
>> > wrote:
>> >
>> > Hi -
>> >
>> > I’m willing to Champion and Mentor. I have a couple of comments
>> > inline.
>> > I’ll look at dependency licenses later today. It’s early for me.
>> >
>> >
>> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>> >
>> > Hi all,
>> >
>> > I am Reed, as a developer worked with the team for Palo (a MPP-based
>> >
>> > interactive SQL data warehousing).
>> >
>> > https://github.com/baidu/palo/wiki/Palo-Overview
>> >
>> > We propose to contribute Palo as an Apache Incubator project, and
>> > we are still looking for possible Champion if anyone would like to
>> >
>> > volunteer. Thanks a lot.
>> >
>> >
>> > Best Regards,
>> > Reed
>> >
>> > ===================
>> > The draft of the proposal as below:
>> >
>> > #Apache Palo
>> >
>> > ##Abstract
>> >
>> > Palo is a MPP-based interactive SQL data warehousing for reporting
>> > and
>> >
>> > analysis.
>> >
>> >
>> > ##Proposal
>> >
>> > We propose to contribute the Palo codebase and associated artifacts
>> >
>> > (e.g. documentation, web-site content etc.) to the Apache Software
>> > Foundation with the intent of forming a productive, meritocratic and
>> > open
>> > community around Palo’s continued development, according to the
>> > ‘Apache
>> > Way’.
>> >
>> >
>> > Baidu owns several trademarks regarding Palo, and proposes to
>> > transfer
>> >
>> > ownership of those trademarks in full to the ASF.
>> >
>> >
>> > ###Overview of Palo
>> >
>> > Palo’s implementation consists of two daemons: Frontend (FE) and
>> > Backend
>> >
>> > (BE).
>> >
>> >
>> > **Frontend daemon** consists of query coordinator and catalog
>> > manager.
>> >
>> > Query coordinator is responsible for receiving users’ sql queries,
>> > compiling queries and managing queries execution. Catalog manager is
>> > responsible for managing metadata such as databases, tables,
>> > partitions,
>> > replicas and etc. Several frontend daemons could be deployed to
>> > guarantee
>> > fault-tolerance, and load balancing.
>> >
>> >
>> > **Backend daemon** stores the data and executes the query fragments.
>> >
>> > Many backend daemons could also be deployed to provide scalability
>> > and
>> > fault-tolerance.
>> >
>> >
>> > A typical Palo cluster generally composes of several frontend
>> > daemons
>> >
>> > and dozens to hundreds of backend daemons.
>> >
>> >
>> > Users can use MySQL client tools to connect any frontend daemon to
>> >
>> > submit SQL query. Frontend receives the query and compiles it into
>> > query
>> > plans executable by the Backend. Then Frontend sends the query plan
>> > fragments to Backend. Backend will build a query execution DAG. Data
>> > is
>> > fetched and pipelined into the DAG. The final result response is sent
>> > to
>> > client via Frontend. The distribution of query fragment execution
>> > takes
>> > minimizing data movement and maximizing scan locality as the main
>> > goal.
>> >
>> >
>> > ##Background
>> >
>> > At Baidu, Prior to Palo, different tools were deployed to solve
>> > diverse
>> >
>> > requirements in many ways. And when a use case requires the
>> > simultaneous
>> > availability of capabilities that cannot all be provided by a single
>> > tool,
>> > users were forced to build hybrid architectures that stitch multiple
>> > tools
>> > together, but we believe that they shouldn’t need to accept such
>> > inherent
>> > complexity. A storage system built to provide great performance
>> > across a
>> > broad range of workloads provides a more elegant solution to the
>> > problems
>> > that hybrid architectures aim to solve. Palo is the solution.
>> >
>> >
>> > Palo is designed to be a simple and single tightly coupled system,
>> > not
>> >
>> > depending on other systems. Palo provides high concurrent low latency
>> > point
>> > query performance, but also provides high throughput queries of
>> > ad-hoc
>> > analysis. Palo provides bulk-batch data loading, but also provides
>> > near
>> > real-time mini-batch data loading. Palo also provides high
>> > availability,
>> > reliability, fault tolerance, and scalability.
>> >
>> >
>> > ##Rationale
>> >
>> > Palo mainly integrates the technology of Google Mesa and Apache
>> > Impala.
>> >
>> > Mesa is a highly scalable analytic data storage system that stores
>> >
>> > critical measurement data related to Google's Internet advertising
>> > business. Mesa is designed to satisfy complex and challenging set of
>> > users’
>> > and systems’ requirements, including near real-time data ingestion
>> > and
>> > query ability, as well as high availability, reliability, fault
>> > tolerance,
>> > and scalability for large data and query volumes.
>> >
>> >
>> > Impala is a modern, open-source MPP SQL engine architected from the
>> >
>> > ground up for the Hadoop data processing environment. At present, by
>> > virtue
>> > of its superior performance and rich functionality， Impala has been
>> > comparable to many commercial MPP database query engine. Mesa can
>> > satisfy
>> > the needs of many of our storage requirements, however Mesa itself
>> > does not
>> > provide a SQL query engine; Impala is a very good MPP SQL query
>> > engine, but
>> > the lack of a perfect distributed storage engine. So in the end we
>> > chose
>> > the combination of these two technologies.
>> >
>> >
>> > Learning from Mesa’s data model, we developed a distributed storage
>> >
>> > engine. Unlike Mesa, this storage engine does not rely on any
>> > distributed
>> > file system. Then we deeply integrate this storage engine with Impala
>> > query
>> > engine. Query compiling, query execution coordination and catalog
>> > management of storage engine are integrated to be frontend daemon;
>> > query
>> > execution and data storage are integrated to be backend daemon. With
>> > this
>> > integration, we implemented a single, full-featured, high performance
>> > state
>> > the art of MPP database, as well as maintaining the simplicity.
>> >
>> >
>> > ##Current Status
>> >
>> > Palo has been an open source project on GitHub (
>> >
>> > https://github.com/baidu/palo).
>> >
>> >
>> > ###Meritocracy
>> >
>> > Palo has been deployed in production at Baidu and is applying more
>> > than
>> >
>> > 200 lines of business. It has demonstrated great performance benefits
>> > and
>> > has proved to be a better way for reporting and analysis based big
>> > data.
>> > Still We look forward to growing a rich user and developer community.
>> >
>> >
>> > ###Community
>> >
>> > Palo seeks to develop developer and user communities during
>> > incubation.
>> >
>> > ###Core Developers
>> >
>> > * Ruyue Ma (https://github.com/maruyue,
>> > maruyue@baidu.com<mailto:maruy
>> >
>> > ue@baidu.com>)
>> >
>> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>> >
>> > bu
>> >
>> > aa.zhaoc@gmail.com>)
>> >
>> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
>> >
>> > ma
>> >
>> > iltolide@sina.com%EF%BC%89>
>> >
>> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> >
>> > <ma...@baidu.com>)
>> >
>> > * Chaoyong Li (https://github.com/cyongli,
>> > lichaoyong@baidu.com<mailto:
>> >
>> > lichaoyong@baidu.com>)
>> >
>> > * Bin Lin (https://github.com/lingbin,
>> > lingbinlb@gmail.com<mailto:lin
>> >
>> > gbinlb@gmail.com>)
>> >
>> >
>> > ###Alignment
>> >
>> > Palo is related to several other Apache projects:
>> >
>> > * Palo can also read data stored in Apache Hadoop clusters powered
>> > by
>> >
>> > the HDFS filesystem.
>> >
>> > * Palo is closely integrated with Impala, which is also being
>> > proposed
>> >
>> > to the Incubator.
>> >
>> > Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>> >
>> > * Palo uses Apache Thrift as its RPC and serialization framework of
>> >
>> > choice.
>> >
>> >
>> > ##Known Risks
>> >
>> > ###Orphaned Products
>> >
>> > The core developers of Palo team plan to work full time on this
>> > project.
>> >
>> > There is very little risk of Palo getting orphaned since at least one
>> > large
>> > company (Baidu) is extensively using it in their production. For
>> > example,
>> > currently there are more than 200 use cases using Palo in production.
>> > Furthermore, since Palo was open sourced at the beginning of October
>> > 2017,
>> > it has received more than 660 stars and been forked nearly 170 times.
>> > We
>> > plan to extend and diversify this community further through Apache.
>> >
>> >
>> > ###Inexperience with Open Source
>> >
>> > The core developers are all active users and followers of open
>> > source.
>> >
>> > They are already committers and contributors to the Palo Github
>> > project.
>> > All have been involved with the source code that has been released
>> > under an
>> > open source license, and several of them also have experience
>> > developing
>> > code in an open source environment. Though the core set of Developers
>> > do
>> > not have Apache Open Source experience, there are plans to onboard
>> > individuals with Apache open source experience on to the project.
>> >
>> >
>> > ###Homogenous Developers
>> >
>> > The most of core developers are from Baidu, but after Palo was open
>> >
>> > sourced, Palo received a lot of bug fixes and enhancements from other
>> > developers not working at Baidu.
>> >
>> >
>> > ###Reliance on Salaried Developers
>> >
>> > Baidu invested in Palo as the OLAP solution and some of its key
>> >
>> > engineers are working full time on the project. In addition, since
>> > there is
>> > a growing Big Data need for scalable OLAP solutions, we look forward
>> > to
>> > other Apache developers and researchers to contribute to the project.
>> > Also
>> > key to addressing the risk associated with relying on Salaried
>> > developers
>> > from a single entity is to increase the diversity of the contributors
>> > and
>> > actively lobby for Domain experts in the BI space to contribute.
>> > Apache
>> > Palo intends to do this.
>> >
>> >
>> > ###An Excessive Fascination with the Apache Brand
>> >
>> > Palo is proposing to enter incubation at Apache in order to help
>> > efforts
>> >
>> > to diversify the committer-base, not so much to capitalize on the
>> > Apache
>> > brand. The Palo project is in production use already inside Baidu,
>> > but is
>> > not expected to be an Baidu product for external customers. As such,
>> > the
>> > Palo project is not seeking to use the Apache brand as a marketing
>> > tool.
>> >
>> >
>> > ##Documentation
>> >
>> > Information about Palo can be found at
>> > https://github.com/baidu/palo.
>> >
>> > The following links provide more information about Palo in open
>> > source:
>> >
>> >
>> > * Palo wiki site: https://github.com/baidu/palo/wiki
>> > * Codebase at Github: https://github.com/baidu/palo
>> > * Issue Tracking: https://github.com/baidu/palo/issues
>> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> >
>> > ##Initial Source
>> >
>> > Palo has been under development since 2017 by a team of engineers at
>> >
>> > Baidu Inc. It is currently hosted on Github.com under an Apache
>> > license at
>> > https://github.com/baidu/palo.
>> >
>> >
>> > ##External Dependencies
>> >
>> > Palo has the following external dependencies.
>> >
>> > * Google gflags (BSD)
>> > * Google glog (BSD)
>> > * Apache Thrift (Apache Software License v2.0)
>> > * Apache Commons (Apache Software License v2.0)
>> > * Boost (Boost Software License)
>> > * OpenLdap (OpenLDAP Software License)
>> > * rapidjson (Tencent)
>> > * Google RE2 (BSD-style)
>> > * lz4 (BSD)
>> > * snappy (BSD)
>> > * cyrus-sasl (CMU License)
>> > * Twitter Bootstrap (Apache Software License v2.0)
>> > * d3 (BSD)
>> > * LLVM (BSD-like)
>> >
>> > Build and test dependencies:
>> >
>> > * ant (Apache Software License v2.0)
>> > * Apache Maven (Apache Software License v2.0)
>> > * cmake (BSD)
>> > * clang (BSD)
>> > * Google gtest (Apache Software License v2.0)
>> >
>> > ##Required Resources
>> >
>> > ###Mailing List
>> >
>> > There are currently no mailing lists. The usual mailing lists are
>> >
>> > expected to be set up when entering incubation:
>> >
>> >
>> > private@palo.incubator.apache.org<mailto:private@palo.
>> >
>> > incubator.apache.org>
>> >
>> > dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> > commits@palo.incubator.apache.org<mailto:commits@palo.
>> >
>> > incubator.apache.org>
>> >
>> >
>> > ###Subversion Directory
>> >
>> > Upon entering incubation: https://github.com/baidu/palo.
>> > After incubation, we want to move the existing repo from
>> >
>> > https://github.com/baidu/palo to Apache infrastructure.
>> >
>> >
>> > ###Issue Tracking
>> >
>> > Palo currently uses GitHub to track issues. Would like to continue
>> > to do
>> >
>> > so while we discuss migration possibilities with the ASF Infra
>> > committee.
>> >
>> >
>> > ###Other Resources
>> >
>> > The existing code already has unit tests so we will make use of
>> > existing
>> >
>> > Apache continuous testing infrastructure. The resulting load should
>> > not be
>> > very large.
>> >
>> >
>> > ##Initial Committers
>> >
>> > * Ruyue Ma (https://github.com/maruyue,
>> > maruyue@baidu.com<mailto:maruy
>> >
>> > ue@baidu.com>)
>> >
>> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>> >
>> > bu
>> >
>> > aa.zhaoc@gmail.com>)
>> >
>> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
>> >
>> > ma
>> >
>> > iltolide@sina.com%EF%BC%89>
>> >
>> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> >
>> > <ma...@baidu.com>)
>> >
>> > * Chaoyong Li (https://github.com/cyongli,
>> > lichaoyong@baidu.com<mailto:
>> >
>> > lichaoyong@baidu.com>)
>> >
>> > * Bin Lin (https://github.com/lingbin,
>> > lingbinlb@gmail.com<mailto:lin
>> >
>> > gbinlb@gmail.com>)
>> >
>> >
>> > ##Affiliations
>> >
>> > The initial committers are employees of Baidu Inc.. The nominated
>> >
>> > mentors are employees of TODO.
>> >
>> >
>> > ##Sponsors
>> >
>> > ###Champion
>> >
>> > TODO
>> >
>> > ###Nominated Mentors
>> >
>> > * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> > * Luke Han, lukehan@apache.org<ma...@apache.org>
>> > * Zheng Shao, zshao@apache.org<ma...@apache.org>
>> >
>> >
>> > Mentors must be members of the IPMC and almost always Members of the
>> > ASF.
>> >
>> > At this moment only Luke Han is qualified.
>> >
>> > Regards,
>> > Dave
>> >
>> >
>> > ###Sponsoring Entity
>> >
>> > We are requesting the Incubator to sponsor this project.
>> >
>> >
>> >
>> > ?B婯
>> > KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>> >
>> > KKKKKKKCB??[
>> >
>> > 溳
>> > X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
>> >
>> > 圹[X[???K[XZ[??賉橽榌
>> >
>> > Z?[???[樰X榏?軏榎?X?K涇櫭B
>> >
>> >
>> >
>> > ?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>> >
>> > KKKKKKKKCB�
>> >
>> > ?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
>> >
>> > ܙ�B��܈?Y??]?[ۘ[?
>> >
>> > ?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
>> >
>> >
>> >
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
>-- 
>Ryan Blue
>Software Engineer
>Netflix

Re: Looking for Champion

Posted by Ryan Blue <rb...@netflix.com.INVALID>.

I agree with Jim, at least mostly.

I don't mind code and toil duplication between projects in itself, but I
think that the current state of the project shows that there are two large
risks to the potential Palo podling (for lack of a better name):

1. The choice not to work with the Impala community initially shows a risk
of not working with others when it may be more difficult to do so than not.
I think this should be directly addressed in the proposal: how do we know
that this will be an open and inclusive community willing to work with
others with slightly different goals?
2. The license problems so far show that the project has not paid adequate
attention to licensing up to now, which is a big risk. I'd like to see what
kind of licensing scrub is proposed before the potential podling's first
release. I don't think that catching all the obvious ones is sufficient.

rb

On Mon, Jun 18, 2018 at 11:51 AM, Jim Apple <jb...@cloudera.com.invalid>
wrote:

> I'm not a binding vote on incubator entry, but I think it would be
> great to have roadmaps as soon as feasible on addressing Tim's concern
> (which is deeply related to #2, "Licensing") and on addressing the
> code and toil duplication.
>
> On Mon, Jun 18, 2018 at 11:08 AM, Dave Fisher <da...@comcast.net>
> wrote:
> > Hi Li,De -
> >
> > Since I agreed to champion this project I think that we need a summary
> about
> > what the Incubator PMC cares about in order to accept a podling. What the
> > prospective project needs to address. We also need to be clear what
> should
> > happen during Incubation and at what time. I think that many of the
> > questions that came up in this thread had to do with assessing how much
> > effort it will take to Incubate Palo (or whatever the name will be)
> >
> > (1) The name Palo. Since there seems to be an issue with that name we
> should
> > have a new name. It is not unknown for a podling to change its name, but
> > that does generate extra work for Infrastructure to change the name after
> > podling start up. It would be our preference for Palo to find a new name
> > prior to VOTING on the proposal. Please do this elsewhere and come back
> to
> > me with the new name so that I can help with the updated proposal.
> >
> > (2) Licensing of the software. Several bits came up as questionable.
> > Regardless of cleanup that has already occurred we have identified that
> we
> > will need to be very careful. It will be important to discuss and
> carefully
> > handle the Software Grant Agreement to make sure that the source listed
> is
> > correct. I think that the SGA must come early during incubation.
> >
> > (3) Relationship with Impala. Palo has apparently forked portions of
> Impala.
> > This means that some are concerned that there is a missed synergy with
> the
> > Apache Impala project. Is there a clean interface that can be built
> between
> > the projects? It would help if the Palo developers would explore this
> with
> > Impala at dev@impala.apache.org.
> >
> > That said, part of the Incubation process is to learn the Apache Way.
> IMHO
> > it is ok for the relationship between Impala PMC and a pooling PPMC to
> be a
> > work in process.
> >
> > (4) Currently, Willem, Luke Han and Dave Fisher are qualified to
> officially
> > mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial
> > Committers in order to help from within the PPMC.
> >
> > On Jun 14, 2018, at 11:03 AM, Jim Apple <jb...@cloudera.com.INVALID>
> > wrote:
> >
> > I don't want to be a stickler, but I don't think "For issues mentioned by
> > Jim, Todd and Tim, I have replied on last Saturday."
> >
> > To my email about Palo being an ASF project as a storage system without a
> > query engine, you replied only, "We will seriously consider this
> proposal."
> >
> > I see no response to Tim's concern that "The code isn't owned by any
> > individual, I contributed it to Apache and it's
> > free for anyone to do what they want to do with it, but pulling in
> > improvements from other projects without any attempt to attribute it or
> > contribute improvements back seems contrary to the Apache way.”
> >
> >
> > Jim - do you need answers to these concerns prior to agreeing to accept
> this
> > project into the Incubator?
> >
> > Regards,
> > Dave
> >
> >
> > On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com> wrote:
> >
> > Hi all,
> >
> > About Palo, we have fixed following issues.
> >
> > 1. Related Impala
> > For issues mentioned by Jim, Todd and Tim, I have replied on last
> Saturday.
> >
> > 2、Lisence issue
> > For issues mentioned by Todd and Ted.
> > 1) be/aes/* come from mysql-5.6, GPL v2.1 license
> > Fixed: removed aes related codes.
> > https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
> > 180b30bf
> > b7
> > https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
> > 77698f1c
> > ed
> >
> > 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
> > Fixed: removed mysql_dtoa related codes.
> > https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
> > 75b1f841
> > a1
> >
> > 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
> > Fixed: restored to original lisence, we are searching another http server
> > to replace it.
> > https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
> > f59f04a8
> > 31
> >
> > 4) be/rpc/*
> > Fixed: We have replaced it with brpc, and we will remove Hypertable after
> > few weeks for waiting users' upgrade to brpc.
> > https://github.com/baidu/palo/tree/master/be/src/rpc
> >
> > 3、Dependency licenses
> > For issue mentioned by Dave, It looks like that Palo have not depend on
> > OpenLdap and cyrus-sasl directly,
> > but some thirdpary libraries need them to compile, libcurl and gperftools
> > for instance.
> > For rapidjson, we are looking for alternative one.
> >
> > 4、About the name of Palo
> > For issue mentioned by Julian.
> > We are figuring out a better one.
> >
> > Best Regards,
> > Reed
> >
> >
> >
> > 在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com> 写入:
> >
> > Hi Julian,
> >
> > Thank you.
> >
> > It looks like that we have to find another one.
> > If anyone has a good name, please feel free to let me know.
> >
> > Best Regards,
> > Reed
> >
> > 在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:
> >
> > Note that there is an existing database product called Palo - an open
> > source OLAP engine by German company Jedox[1]. There there is a high
> > likelihood that Palo would have to change its name during incubation, if
> > accepted.
> >
> > Julian
> >
> > [1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
> > <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
> >
> >
> >
> > On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
> >
> > Cool Dave, it’s great to have you to be the campaign.
> >
> >
> > ________________________________
> > From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
> > Sent: Saturday, June 9, 2018 8:16:28 AM
> > To: general@incubator.apache.org <ma...@incubator.apache.org>
> > Subject: Re: Looking for Champion
> >
> > thanks，willem
> >
> > we are very appreciate.
> >
> > 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
> >
> > Hi,
> >
> > I'm willing to be the Mentor.
> > Please count me in.
> >
> >
> >
> > Willem Jiang
> >
> > Twitter: willemjiang
> > Weibo: 姜宁willem
> >
> > On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>
> > wrote:
> >
> > Hi -
> >
> > I’m willing to Champion and Mentor. I have a couple of comments
> > inline.
> > I’ll look at dependency licenses later today. It’s early for me.
> >
> >
> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
> >
> > Hi all,
> >
> > I am Reed, as a developer worked with the team for Palo (a MPP-based
> >
> > interactive SQL data warehousing).
> >
> > https://github.com/baidu/palo/wiki/Palo-Overview
> >
> > We propose to contribute Palo as an Apache Incubator project, and
> > we are still looking for possible Champion if anyone would like to
> >
> > volunteer. Thanks a lot.
> >
> >
> > Best Regards,
> > Reed
> >
> > ===================
> > The draft of the proposal as below:
> >
> > #Apache Palo
> >
> > ##Abstract
> >
> > Palo is a MPP-based interactive SQL data warehousing for reporting
> > and
> >
> > analysis.
> >
> >
> > ##Proposal
> >
> > We propose to contribute the Palo codebase and associated artifacts
> >
> > (e.g. documentation, web-site content etc.) to the Apache Software
> > Foundation with the intent of forming a productive, meritocratic and
> > open
> > community around Palo’s continued development, according to the
> > ‘Apache
> > Way’.
> >
> >
> > Baidu owns several trademarks regarding Palo, and proposes to
> > transfer
> >
> > ownership of those trademarks in full to the ASF.
> >
> >
> > ###Overview of Palo
> >
> > Palo’s implementation consists of two daemons: Frontend (FE) and
> > Backend
> >
> > (BE).
> >
> >
> > **Frontend daemon** consists of query coordinator and catalog
> > manager.
> >
> > Query coordinator is responsible for receiving users’ sql queries,
> > compiling queries and managing queries execution. Catalog manager is
> > responsible for managing metadata such as databases, tables,
> > partitions,
> > replicas and etc. Several frontend daemons could be deployed to
> > guarantee
> > fault-tolerance, and load balancing.
> >
> >
> > **Backend daemon** stores the data and executes the query fragments.
> >
> > Many backend daemons could also be deployed to provide scalability
> > and
> > fault-tolerance.
> >
> >
> > A typical Palo cluster generally composes of several frontend
> > daemons
> >
> > and dozens to hundreds of backend daemons.
> >
> >
> > Users can use MySQL client tools to connect any frontend daemon to
> >
> > submit SQL query. Frontend receives the query and compiles it into
> > query
> > plans executable by the Backend. Then Frontend sends the query plan
> > fragments to Backend. Backend will build a query execution DAG. Data
> > is
> > fetched and pipelined into the DAG. The final result response is sent
> > to
> > client via Frontend. The distribution of query fragment execution
> > takes
> > minimizing data movement and maximizing scan locality as the main
> > goal.
> >
> >
> > ##Background
> >
> > At Baidu, Prior to Palo, different tools were deployed to solve
> > diverse
> >
> > requirements in many ways. And when a use case requires the
> > simultaneous
> > availability of capabilities that cannot all be provided by a single
> > tool,
> > users were forced to build hybrid architectures that stitch multiple
> > tools
> > together, but we believe that they shouldn’t need to accept such
> > inherent
> > complexity. A storage system built to provide great performance
> > across a
> > broad range of workloads provides a more elegant solution to the
> > problems
> > that hybrid architectures aim to solve. Palo is the solution.
> >
> >
> > Palo is designed to be a simple and single tightly coupled system,
> > not
> >
> > depending on other systems. Palo provides high concurrent low latency
> > point
> > query performance, but also provides high throughput queries of
> > ad-hoc
> > analysis. Palo provides bulk-batch data loading, but also provides
> > near
> > real-time mini-batch data loading. Palo also provides high
> > availability,
> > reliability, fault tolerance, and scalability.
> >
> >
> > ##Rationale
> >
> > Palo mainly integrates the technology of Google Mesa and Apache
> > Impala.
> >
> > Mesa is a highly scalable analytic data storage system that stores
> >
> > critical measurement data related to Google's Internet advertising
> > business. Mesa is designed to satisfy complex and challenging set of
> > users’
> > and systems’ requirements, including near real-time data ingestion
> > and
> > query ability, as well as high availability, reliability, fault
> > tolerance,
> > and scalability for large data and query volumes.
> >
> >
> > Impala is a modern, open-source MPP SQL engine architected from the
> >
> > ground up for the Hadoop data processing environment. At present, by
> > virtue
> > of its superior performance and rich functionality， Impala has been
> > comparable to many commercial MPP database query engine. Mesa can
> > satisfy
> > the needs of many of our storage requirements, however Mesa itself
> > does not
> > provide a SQL query engine; Impala is a very good MPP SQL query
> > engine, but
> > the lack of a perfect distributed storage engine. So in the end we
> > chose
> > the combination of these two technologies.
> >
> >
> > Learning from Mesa’s data model, we developed a distributed storage
> >
> > engine. Unlike Mesa, this storage engine does not rely on any
> > distributed
> > file system. Then we deeply integrate this storage engine with Impala
> > query
> > engine. Query compiling, query execution coordination and catalog
> > management of storage engine are integrated to be frontend daemon;
> > query
> > execution and data storage are integrated to be backend daemon. With
> > this
> > integration, we implemented a single, full-featured, high performance
> > state
> > the art of MPP database, as well as maintaining the simplicity.
> >
> >
> > ##Current Status
> >
> > Palo has been an open source project on GitHub (
> >
> > https://github.com/baidu/palo).
> >
> >
> > ###Meritocracy
> >
> > Palo has been deployed in production at Baidu and is applying more
> > than
> >
> > 200 lines of business. It has demonstrated great performance benefits
> > and
> > has proved to be a better way for reporting and analysis based big
> > data.
> > Still We look forward to growing a rich user and developer community.
> >
> >
> > ###Community
> >
> > Palo seeks to develop developer and user communities during
> > incubation.
> >
> > ###Core Developers
> >
> > * Ruyue Ma (https://github.com/maruyue,
> > maruyue@baidu.com<mailto:maruy
> >
> > ue@baidu.com>)
> >
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
> >
> > bu
> >
> > aa.zhaoc@gmail.com>)
> >
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
> >
> > ma
> >
> > iltolide@sina.com%EF%BC%89>
> >
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> >
> > <ma...@baidu.com>)
> >
> > * Chaoyong Li (https://github.com/cyongli,
> > lichaoyong@baidu.com<mailto:
> >
> > lichaoyong@baidu.com>)
> >
> > * Bin Lin (https://github.com/lingbin,
> > lingbinlb@gmail.com<mailto:lin
> >
> > gbinlb@gmail.com>)
> >
> >
> > ###Alignment
> >
> > Palo is related to several other Apache projects:
> >
> > * Palo can also read data stored in Apache Hadoop clusters powered
> > by
> >
> > the HDFS filesystem.
> >
> > * Palo is closely integrated with Impala, which is also being
> > proposed
> >
> > to the Incubator.
> >
> > Apache Impala has completed Incubation. Jim Apple is VP, Impala.
> >
> > * Palo uses Apache Thrift as its RPC and serialization framework of
> >
> > choice.
> >
> >
> > ##Known Risks
> >
> > ###Orphaned Products
> >
> > The core developers of Palo team plan to work full time on this
> > project.
> >
> > There is very little risk of Palo getting orphaned since at least one
> > large
> > company (Baidu) is extensively using it in their production. For
> > example,
> > currently there are more than 200 use cases using Palo in production.
> > Furthermore, since Palo was open sourced at the beginning of October
> > 2017,
> > it has received more than 660 stars and been forked nearly 170 times.
> > We
> > plan to extend and diversify this community further through Apache.
> >
> >
> > ###Inexperience with Open Source
> >
> > The core developers are all active users and followers of open
> > source.
> >
> > They are already committers and contributors to the Palo Github
> > project.
> > All have been involved with the source code that has been released
> > under an
> > open source license, and several of them also have experience
> > developing
> > code in an open source environment. Though the core set of Developers
> > do
> > not have Apache Open Source experience, there are plans to onboard
> > individuals with Apache open source experience on to the project.
> >
> >
> > ###Homogenous Developers
> >
> > The most of core developers are from Baidu, but after Palo was open
> >
> > sourced, Palo received a lot of bug fixes and enhancements from other
> > developers not working at Baidu.
> >
> >
> > ###Reliance on Salaried Developers
> >
> > Baidu invested in Palo as the OLAP solution and some of its key
> >
> > engineers are working full time on the project. In addition, since
> > there is
> > a growing Big Data need for scalable OLAP solutions, we look forward
> > to
> > other Apache developers and researchers to contribute to the project.
> > Also
> > key to addressing the risk associated with relying on Salaried
> > developers
> > from a single entity is to increase the diversity of the contributors
> > and
> > actively lobby for Domain experts in the BI space to contribute.
> > Apache
> > Palo intends to do this.
> >
> >
> > ###An Excessive Fascination with the Apache Brand
> >
> > Palo is proposing to enter incubation at Apache in order to help
> > efforts
> >
> > to diversify the committer-base, not so much to capitalize on the
> > Apache
> > brand. The Palo project is in production use already inside Baidu,
> > but is
> > not expected to be an Baidu product for external customers. As such,
> > the
> > Palo project is not seeking to use the Apache brand as a marketing
> > tool.
> >
> >
> > ##Documentation
> >
> > Information about Palo can be found at
> > https://github.com/baidu/palo.
> >
> > The following links provide more information about Palo in open
> > source:
> >
> >
> > * Palo wiki site: https://github.com/baidu/palo/wiki
> > * Codebase at Github: https://github.com/baidu/palo
> > * Issue Tracking: https://github.com/baidu/palo/issues
> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> >
> > ##Initial Source
> >
> > Palo has been under development since 2017 by a team of engineers at
> >
> > Baidu Inc. It is currently hosted on Github.com under an Apache
> > license at
> > https://github.com/baidu/palo.
> >
> >
> > ##External Dependencies
> >
> > Palo has the following external dependencies.
> >
> > * Google gflags (BSD)
> > * Google glog (BSD)
> > * Apache Thrift (Apache Software License v2.0)
> > * Apache Commons (Apache Software License v2.0)
> > * Boost (Boost Software License)
> > * OpenLdap (OpenLDAP Software License)
> > * rapidjson (Tencent)
> > * Google RE2 (BSD-style)
> > * lz4 (BSD)
> > * snappy (BSD)
> > * cyrus-sasl (CMU License)
> > * Twitter Bootstrap (Apache Software License v2.0)
> > * d3 (BSD)
> > * LLVM (BSD-like)
> >
> > Build and test dependencies:
> >
> > * ant (Apache Software License v2.0)
> > * Apache Maven (Apache Software License v2.0)
> > * cmake (BSD)
> > * clang (BSD)
> > * Google gtest (Apache Software License v2.0)
> >
> > ##Required Resources
> >
> > ###Mailing List
> >
> > There are currently no mailing lists. The usual mailing lists are
> >
> > expected to be set up when entering incubation:
> >
> >
> > private@palo.incubator.apache.org<mailto:private@palo.
> >
> > incubator.apache.org>
> >
> > dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> > commits@palo.incubator.apache.org<mailto:commits@palo.
> >
> > incubator.apache.org>
> >
> >
> > ###Subversion Directory
> >
> > Upon entering incubation: https://github.com/baidu/palo.
> > After incubation, we want to move the existing repo from
> >
> > https://github.com/baidu/palo to Apache infrastructure.
> >
> >
> > ###Issue Tracking
> >
> > Palo currently uses GitHub to track issues. Would like to continue
> > to do
> >
> > so while we discuss migration possibilities with the ASF Infra
> > committee.
> >
> >
> > ###Other Resources
> >
> > The existing code already has unit tests so we will make use of
> > existing
> >
> > Apache continuous testing infrastructure. The resulting load should
> > not be
> > very large.
> >
> >
> > ##Initial Committers
> >
> > * Ruyue Ma (https://github.com/maruyue,
> > maruyue@baidu.com<mailto:maruy
> >
> > ue@baidu.com>)
> >
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
> >
> > bu
> >
> > aa.zhaoc@gmail.com>)
> >
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
> >
> > ma
> >
> > iltolide@sina.com%EF%BC%89>
> >
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> >
> > <ma...@baidu.com>)
> >
> > * Chaoyong Li (https://github.com/cyongli,
> > lichaoyong@baidu.com<mailto:
> >
> > lichaoyong@baidu.com>)
> >
> > * Bin Lin (https://github.com/lingbin,
> > lingbinlb@gmail.com<mailto:lin
> >
> > gbinlb@gmail.com>)
> >
> >
> > ##Affiliations
> >
> > The initial committers are employees of Baidu Inc.. The nominated
> >
> > mentors are employees of TODO.
> >
> >
> > ##Sponsors
> >
> > ###Champion
> >
> > TODO
> >
> > ###Nominated Mentors
> >
> > * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> > * Luke Han, lukehan@apache.org<ma...@apache.org>
> > * Zheng Shao, zshao@apache.org<ma...@apache.org>
> >
> >
> > Mentors must be members of the IPMC and almost always Members of the
> > ASF.
> >
> > At this moment only Luke Han is qualified.
> >
> > Regards,
> > Dave
> >
> >
> > ###Sponsoring Entity
> >
> > We are requesting the Incubator to sponsor this project.
> >
> >
> >
> > ?B婯
> > KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
> >
> > KKKKKKKCB??[
> >
> > 溳
> > X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
> >
> > 圹[X[???K[XZ[??賉橽榌
> >
> > Z?[???[樰X榏?軏榎?X?K涇櫭B
> >
> >
> >
> > ?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
> >
> > KKKKKKKKCB�
> >
> > ?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
> >
> > ܙ�B��܈?Y??]?[ۘ[?
> >
> > ?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
> >
> >
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>


-- 
Ryan Blue
Software Engineer
Netflix

Re: Looking for Champion

Posted by Jim Apple <jb...@cloudera.com.INVALID>.

I'm not a binding vote on incubator entry, but I think it would be
great to have roadmaps as soon as feasible on addressing Tim's concern
(which is deeply related to #2, "Licensing") and on addressing the
code and toil duplication.

On Mon, Jun 18, 2018 at 11:08 AM, Dave Fisher <da...@comcast.net> wrote:
> Hi Li,De -
>
> Since I agreed to champion this project I think that we need a summary about
> what the Incubator PMC cares about in order to accept a podling. What the
> prospective project needs to address. We also need to be clear what should
> happen during Incubation and at what time. I think that many of the
> questions that came up in this thread had to do with assessing how much
> effort it will take to Incubate Palo (or whatever the name will be)
>
> (1) The name Palo. Since there seems to be an issue with that name we should
> have a new name. It is not unknown for a podling to change its name, but
> that does generate extra work for Infrastructure to change the name after
> podling start up. It would be our preference for Palo to find a new name
> prior to VOTING on the proposal. Please do this elsewhere and come back to
> me with the new name so that I can help with the updated proposal.
>
> (2) Licensing of the software. Several bits came up as questionable.
> Regardless of cleanup that has already occurred we have identified that we
> will need to be very careful. It will be important to discuss and carefully
> handle the Software Grant Agreement to make sure that the source listed is
> correct. I think that the SGA must come early during incubation.
>
> (3) Relationship with Impala. Palo has apparently forked portions of Impala.
> This means that some are concerned that there is a missed synergy with the
> Apache Impala project. Is there a clean interface that can be built between
> the projects? It would help if the Palo developers would explore this with
> Impala at dev@impala.apache.org.
>
> That said, part of the Incubation process is to learn the Apache Way. IMHO
> it is ok for the relationship between Impala PMC and a pooling PPMC to be a
> work in process.
>
> (4) Currently, Willem, Luke Han and Dave Fisher are qualified to officially
> mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial
> Committers in order to help from within the PPMC.
>
> On Jun 14, 2018, at 11:03 AM, Jim Apple <jb...@cloudera.com.INVALID>
> wrote:
>
> I don't want to be a stickler, but I don't think "For issues mentioned by
> Jim, Todd and Tim, I have replied on last Saturday."
>
> To my email about Palo being an ASF project as a storage system without a
> query engine, you replied only, "We will seriously consider this proposal."
>
> I see no response to Tim's concern that "The code isn't owned by any
> individual, I contributed it to Apache and it's
> free for anyone to do what they want to do with it, but pulling in
> improvements from other projects without any attempt to attribute it or
> contribute improvements back seems contrary to the Apache way.”
>
>
> Jim - do you need answers to these concerns prior to agreeing to accept this
> project into the Incubator?
>
> Regards,
> Dave
>
>
> On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com> wrote:
>
> Hi all,
>
> About Palo, we have fixed following issues.
>
> 1. Related Impala
> For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.
>
> 2、Lisence issue
> For issues mentioned by Todd and Ted.
> 1) be/aes/* come from mysql-5.6, GPL v2.1 license
> Fixed: removed aes related codes.
> https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
> 180b30bf
> b7
> https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
> 77698f1c
> ed
>
> 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
> Fixed: removed mysql_dtoa related codes.
> https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
> 75b1f841
> a1
>
> 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
> Fixed: restored to original lisence, we are searching another http server
> to replace it.
> https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
> f59f04a8
> 31
>
> 4) be/rpc/*
> Fixed: We have replaced it with brpc, and we will remove Hypertable after
> few weeks for waiting users' upgrade to brpc.
> https://github.com/baidu/palo/tree/master/be/src/rpc
>
> 3、Dependency licenses
> For issue mentioned by Dave, It looks like that Palo have not depend on
> OpenLdap and cyrus-sasl directly,
> but some thirdpary libraries need them to compile, libcurl and gperftools
> for instance.
> For rapidjson, we are looking for alternative one.
>
> 4、About the name of Palo
> For issue mentioned by Julian.
> We are figuring out a better one.
>
> Best Regards,
> Reed
>
>
>
> 在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com> 写入:
>
> Hi Julian,
>
> Thank you.
>
> It looks like that we have to find another one.
> If anyone has a good name, please feel free to let me know.
>
> Best Regards,
> Reed
>
> 在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:
>
> Note that there is an existing database product called Palo - an open
> source OLAP engine by German company Jedox[1]. There there is a high
> likelihood that Palo would have to change its name during incubation, if
> accepted.
>
> Julian
>
> [1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
> <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>
>
>
> On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
>
> Cool Dave, it’s great to have you to be the campaign.
>
>
> ________________________________
> From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
> Sent: Saturday, June 9, 2018 8:16:28 AM
> To: general@incubator.apache.org <ma...@incubator.apache.org>
> Subject: Re: Looking for Champion
>
> thanks，willem
>
> we are very appreciate.
>
> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
>
> Hi,
>
> I'm willing to be the Mentor.
> Please count me in.
>
>
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: 姜宁willem
>
> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>
> wrote:
>
> Hi -
>
> I’m willing to Champion and Mentor. I have a couple of comments
> inline.
> I’ll look at dependency licenses later today. It’s early for me.
>
>
> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>
> Hi all,
>
> I am Reed, as a developer worked with the team for Palo (a MPP-based
>
> interactive SQL data warehousing).
>
> https://github.com/baidu/palo/wiki/Palo-Overview
>
> We propose to contribute Palo as an Apache Incubator project, and
> we are still looking for possible Champion if anyone would like to
>
> volunteer. Thanks a lot.
>
>
> Best Regards,
> Reed
>
> ===================
> The draft of the proposal as below:
>
> #Apache Palo
>
> ##Abstract
>
> Palo is a MPP-based interactive SQL data warehousing for reporting
> and
>
> analysis.
>
>
> ##Proposal
>
> We propose to contribute the Palo codebase and associated artifacts
>
> (e.g. documentation, web-site content etc.) to the Apache Software
> Foundation with the intent of forming a productive, meritocratic and
> open
> community around Palo’s continued development, according to the
> ‘Apache
> Way’.
>
>
> Baidu owns several trademarks regarding Palo, and proposes to
> transfer
>
> ownership of those trademarks in full to the ASF.
>
>
> ###Overview of Palo
>
> Palo’s implementation consists of two daemons: Frontend (FE) and
> Backend
>
> (BE).
>
>
> **Frontend daemon** consists of query coordinator and catalog
> manager.
>
> Query coordinator is responsible for receiving users’ sql queries,
> compiling queries and managing queries execution. Catalog manager is
> responsible for managing metadata such as databases, tables,
> partitions,
> replicas and etc. Several frontend daemons could be deployed to
> guarantee
> fault-tolerance, and load balancing.
>
>
> **Backend daemon** stores the data and executes the query fragments.
>
> Many backend daemons could also be deployed to provide scalability
> and
> fault-tolerance.
>
>
> A typical Palo cluster generally composes of several frontend
> daemons
>
> and dozens to hundreds of backend daemons.
>
>
> Users can use MySQL client tools to connect any frontend daemon to
>
> submit SQL query. Frontend receives the query and compiles it into
> query
> plans executable by the Backend. Then Frontend sends the query plan
> fragments to Backend. Backend will build a query execution DAG. Data
> is
> fetched and pipelined into the DAG. The final result response is sent
> to
> client via Frontend. The distribution of query fragment execution
> takes
> minimizing data movement and maximizing scan locality as the main
> goal.
>
>
> ##Background
>
> At Baidu, Prior to Palo, different tools were deployed to solve
> diverse
>
> requirements in many ways. And when a use case requires the
> simultaneous
> availability of capabilities that cannot all be provided by a single
> tool,
> users were forced to build hybrid architectures that stitch multiple
> tools
> together, but we believe that they shouldn’t need to accept such
> inherent
> complexity. A storage system built to provide great performance
> across a
> broad range of workloads provides a more elegant solution to the
> problems
> that hybrid architectures aim to solve. Palo is the solution.
>
>
> Palo is designed to be a simple and single tightly coupled system,
> not
>
> depending on other systems. Palo provides high concurrent low latency
> point
> query performance, but also provides high throughput queries of
> ad-hoc
> analysis. Palo provides bulk-batch data loading, but also provides
> near
> real-time mini-batch data loading. Palo also provides high
> availability,
> reliability, fault tolerance, and scalability.
>
>
> ##Rationale
>
> Palo mainly integrates the technology of Google Mesa and Apache
> Impala.
>
> Mesa is a highly scalable analytic data storage system that stores
>
> critical measurement data related to Google's Internet advertising
> business. Mesa is designed to satisfy complex and challenging set of
> users’
> and systems’ requirements, including near real-time data ingestion
> and
> query ability, as well as high availability, reliability, fault
> tolerance,
> and scalability for large data and query volumes.
>
>
> Impala is a modern, open-source MPP SQL engine architected from the
>
> ground up for the Hadoop data processing environment. At present, by
> virtue
> of its superior performance and rich functionality， Impala has been
> comparable to many commercial MPP database query engine. Mesa can
> satisfy
> the needs of many of our storage requirements, however Mesa itself
> does not
> provide a SQL query engine; Impala is a very good MPP SQL query
> engine, but
> the lack of a perfect distributed storage engine. So in the end we
> chose
> the combination of these two technologies.
>
>
> Learning from Mesa’s data model, we developed a distributed storage
>
> engine. Unlike Mesa, this storage engine does not rely on any
> distributed
> file system. Then we deeply integrate this storage engine with Impala
> query
> engine. Query compiling, query execution coordination and catalog
> management of storage engine are integrated to be frontend daemon;
> query
> execution and data storage are integrated to be backend daemon. With
> this
> integration, we implemented a single, full-featured, high performance
> state
> the art of MPP database, as well as maintaining the simplicity.
>
>
> ##Current Status
>
> Palo has been an open source project on GitHub (
>
> https://github.com/baidu/palo).
>
>
> ###Meritocracy
>
> Palo has been deployed in production at Baidu and is applying more
> than
>
> 200 lines of business. It has demonstrated great performance benefits
> and
> has proved to be a better way for reporting and analysis based big
> data.
> Still We look forward to growing a rich user and developer community.
>
>
> ###Community
>
> Palo seeks to develop developer and user communities during
> incubation.
>
> ###Core Developers
>
> * Ruyue Ma (https://github.com/maruyue,
> maruyue@baidu.com<mailto:maruy
>
> ue@baidu.com>)
>
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>
> bu
>
> aa.zhaoc@gmail.com>)
>
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
>
> ma
>
> iltolide@sina.com%EF%BC%89>
>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>
> <ma...@baidu.com>)
>
> * Chaoyong Li (https://github.com/cyongli,
> lichaoyong@baidu.com<mailto:
>
> lichaoyong@baidu.com>)
>
> * Bin Lin (https://github.com/lingbin,
> lingbinlb@gmail.com<mailto:lin
>
> gbinlb@gmail.com>)
>
>
> ###Alignment
>
> Palo is related to several other Apache projects:
>
> * Palo can also read data stored in Apache Hadoop clusters powered
> by
>
> the HDFS filesystem.
>
> * Palo is closely integrated with Impala, which is also being
> proposed
>
> to the Incubator.
>
> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>
> * Palo uses Apache Thrift as its RPC and serialization framework of
>
> choice.
>
>
> ##Known Risks
>
> ###Orphaned Products
>
> The core developers of Palo team plan to work full time on this
> project.
>
> There is very little risk of Palo getting orphaned since at least one
> large
> company (Baidu) is extensively using it in their production. For
> example,
> currently there are more than 200 use cases using Palo in production.
> Furthermore, since Palo was open sourced at the beginning of October
> 2017,
> it has received more than 660 stars and been forked nearly 170 times.
> We
> plan to extend and diversify this community further through Apache.
>
>
> ###Inexperience with Open Source
>
> The core developers are all active users and followers of open
> source.
>
> They are already committers and contributors to the Palo Github
> project.
> All have been involved with the source code that has been released
> under an
> open source license, and several of them also have experience
> developing
> code in an open source environment. Though the core set of Developers
> do
> not have Apache Open Source experience, there are plans to onboard
> individuals with Apache open source experience on to the project.
>
>
> ###Homogenous Developers
>
> The most of core developers are from Baidu, but after Palo was open
>
> sourced, Palo received a lot of bug fixes and enhancements from other
> developers not working at Baidu.
>
>
> ###Reliance on Salaried Developers
>
> Baidu invested in Palo as the OLAP solution and some of its key
>
> engineers are working full time on the project. In addition, since
> there is
> a growing Big Data need for scalable OLAP solutions, we look forward
> to
> other Apache developers and researchers to contribute to the project.
> Also
> key to addressing the risk associated with relying on Salaried
> developers
> from a single entity is to increase the diversity of the contributors
> and
> actively lobby for Domain experts in the BI space to contribute.
> Apache
> Palo intends to do this.
>
>
> ###An Excessive Fascination with the Apache Brand
>
> Palo is proposing to enter incubation at Apache in order to help
> efforts
>
> to diversify the committer-base, not so much to capitalize on the
> Apache
> brand. The Palo project is in production use already inside Baidu,
> but is
> not expected to be an Baidu product for external customers. As such,
> the
> Palo project is not seeking to use the Apache brand as a marketing
> tool.
>
>
> ##Documentation
>
> Information about Palo can be found at
> https://github.com/baidu/palo.
>
> The following links provide more information about Palo in open
> source:
>
>
> * Palo wiki site: https://github.com/baidu/palo/wiki
> * Codebase at Github: https://github.com/baidu/palo
> * Issue Tracking: https://github.com/baidu/palo/issues
> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>
> ##Initial Source
>
> Palo has been under development since 2017 by a team of engineers at
>
> Baidu Inc. It is currently hosted on Github.com under an Apache
> license at
> https://github.com/baidu/palo.
>
>
> ##External Dependencies
>
> Palo has the following external dependencies.
>
> * Google gflags (BSD)
> * Google glog (BSD)
> * Apache Thrift (Apache Software License v2.0)
> * Apache Commons (Apache Software License v2.0)
> * Boost (Boost Software License)
> * OpenLdap (OpenLDAP Software License)
> * rapidjson (Tencent)
> * Google RE2 (BSD-style)
> * lz4 (BSD)
> * snappy (BSD)
> * cyrus-sasl (CMU License)
> * Twitter Bootstrap (Apache Software License v2.0)
> * d3 (BSD)
> * LLVM (BSD-like)
>
> Build and test dependencies:
>
> * ant (Apache Software License v2.0)
> * Apache Maven (Apache Software License v2.0)
> * cmake (BSD)
> * clang (BSD)
> * Google gtest (Apache Software License v2.0)
>
> ##Required Resources
>
> ###Mailing List
>
> There are currently no mailing lists. The usual mailing lists are
>
> expected to be set up when entering incubation:
>
>
> private@palo.incubator.apache.org<mailto:private@palo.
>
> incubator.apache.org>
>
> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> commits@palo.incubator.apache.org<mailto:commits@palo.
>
> incubator.apache.org>
>
>
> ###Subversion Directory
>
> Upon entering incubation: https://github.com/baidu/palo.
> After incubation, we want to move the existing repo from
>
> https://github.com/baidu/palo to Apache infrastructure.
>
>
> ###Issue Tracking
>
> Palo currently uses GitHub to track issues. Would like to continue
> to do
>
> so while we discuss migration possibilities with the ASF Infra
> committee.
>
>
> ###Other Resources
>
> The existing code already has unit tests so we will make use of
> existing
>
> Apache continuous testing infrastructure. The resulting load should
> not be
> very large.
>
>
> ##Initial Committers
>
> * Ruyue Ma (https://github.com/maruyue,
> maruyue@baidu.com<mailto:maruy
>
> ue@baidu.com>)
>
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>
> bu
>
> aa.zhaoc@gmail.com>)
>
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
>
> ma
>
> iltolide@sina.com%EF%BC%89>
>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>
> <ma...@baidu.com>)
>
> * Chaoyong Li (https://github.com/cyongli,
> lichaoyong@baidu.com<mailto:
>
> lichaoyong@baidu.com>)
>
> * Bin Lin (https://github.com/lingbin,
> lingbinlb@gmail.com<mailto:lin
>
> gbinlb@gmail.com>)
>
>
> ##Affiliations
>
> The initial committers are employees of Baidu Inc.. The nominated
>
> mentors are employees of TODO.
>
>
> ##Sponsors
>
> ###Champion
>
> TODO
>
> ###Nominated Mentors
>
> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> * Luke Han, lukehan@apache.org<ma...@apache.org>
> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>
>
> Mentors must be members of the IPMC and almost always Members of the
> ASF.
>
> At this moment only Luke Han is qualified.
>
> Regards,
> Dave
>
>
> ###Sponsoring Entity
>
> We are requesting the Incubator to sponsor this project.
>
>
>
> ?B婯
> KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>
> KKKKKKKCB??[
>
> 溳
> X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
>
> 圹[X[???K[XZ[??賉橽榌
>
> Z?[???[樰X榏?軏榎?X?K涇櫭B
>
>
>
> ?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>
> KKKKKKKKCB�
>
> ?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
>
> ܙ�B��܈?Y??]?[ۘ[?
>
> ?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Looking for Champion

Posted by Dave Fisher <da...@comcast.net>.

Hi Li,De -

Since I agreed to champion this project I think that we need a summary about what the Incubator PMC cares about in order to accept a podling. What the prospective project needs to address. We also need to be clear what should happen during Incubation and at what time. I think that many of the questions that came up in this thread had to do with assessing how much effort it will take to Incubate Palo (or whatever the name will be)

(1) The name Palo. Since there seems to be an issue with that name we should have a new name. It is not unknown for a podling to change its name, but that does generate extra work for Infrastructure to change the name after podling start up. It would be our preference for Palo to find a new name prior to VOTING on the proposal. Please do this elsewhere and come back to me with the new name so that I can help with the updated proposal.

(2) Licensing of the software. Several bits came up as questionable. Regardless of cleanup that has already occurred we have identified that we will need to be very careful. It will be important to discuss and carefully handle the Software Grant Agreement to make sure that the source listed is correct. I think that the SGA must come early during incubation.

(3) Relationship with Impala. Palo has apparently forked portions of Impala. This means that some are concerned that there is a missed synergy with the Apache Impala project. Is there a clean interface that can be built between the projects? It would help if the Palo developers would explore this with Impala at dev@impala.apache.org <ma...@impala.apache.org>.

That said, part of the Incubation process is to learn the Apache Way. IMHO it is ok for the relationship between Impala PMC and a pooling PPMC to be a work in process.

(4) Currently, Willem, Luke Han and Dave Fisher are qualified to officially mentor. I suggest that Sijie Guo and Zheng Shao be included as Initial Committers in order to help from within the PPMC.

> On Jun 14, 2018, at 11:03 AM, Jim Apple <jb...@cloudera.com.INVALID> wrote:
> 
> I don't want to be a stickler, but I don't think "For issues mentioned by
> Jim, Todd and Tim, I have replied on last Saturday."
> 
> To my email about Palo being an ASF project as a storage system without a
> query engine, you replied only, "We will seriously consider this proposal."
> 
> I see no response to Tim's concern that "The code isn't owned by any
> individual, I contributed it to Apache and it's
> free for anyone to do what they want to do with it, but pulling in
> improvements from other projects without any attempt to attribute it or
> contribute improvements back seems contrary to the Apache way.”

Jim - do you need answers to these concerns prior to agreeing to accept this project into the Incubator?

Regards,
Dave

> 
> On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com> wrote:
> 
>> Hi all,
>> 
>> About Palo, we have fixed following issues.
>> 
>> 1. Related Impala
>> For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.
>> 
>> 2、Lisence issue
>> For issues mentioned by Todd and Ted.
>> 1) be/aes/* come from mysql-5.6, GPL v2.1 license
>> Fixed: removed aes related codes.
>> https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
>> 180b30bf
>> b7
>> https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
>> 77698f1c
>> ed
>> 
>> 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
>> Fixed: removed mysql_dtoa related codes.
>> https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
>> 75b1f841
>> a1
>> 
>> 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
>> Fixed: restored to original lisence, we are searching another http server
>> to replace it.
>> https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
>> f59f04a8
>> 31
>> 
>> 4) be/rpc/*
>> Fixed: We have replaced it with brpc, and we will remove Hypertable after
>> few weeks for waiting users' upgrade to brpc.
>> https://github.com/baidu/palo/tree/master/be/src/rpc
>> 
>> 3、Dependency licenses
>> For issue mentioned by Dave, It looks like that Palo have not depend on
>> OpenLdap and cyrus-sasl directly,
>> but some thirdpary libraries need them to compile, libcurl and gperftools
>> for instance.
>> For rapidjson, we are looking for alternative one.
>> 
>> 4、About the name of Palo
>> For issue mentioned by Julian.
>> We are figuring out a better one.
>> 
>> Best Regards,
>> Reed
>> 
>> 
>> 
>> 在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com> 写入:
>> 
>>> Hi Julian,
>>> 
>>> Thank you.
>>> 
>>> It looks like that we have to find another one.
>>> If anyone has a good name, please feel free to let me know.
>>> 
>>> Best Regards,
>>> Reed
>>> 
>>> 在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:
>>> 
>>>> Note that there is an existing database product called Palo - an open
>>>> source OLAP engine by German company Jedox[1]. There there is a high
>>>> likelihood that Palo would have to change its name during incubation, if
>>>> accepted.
>>>> 
>>>> Julian
>>>> 
>>>> [1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
>>>> <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>>>> 
>>>> 
>>>> 
>>>>> On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
>>>>> 
>>>>> Cool Dave, it’s great to have you to be the campaign.
>>>>> 
>>>>> 
>>>>> ________________________________
>>>>> From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
>>>>> Sent: Saturday, June 9, 2018 8:16:28 AM
>>>>> To: general@incubator.apache.org <ma...@incubator.apache.org>
>>>>> Subject: Re: Looking for Champion
>>>>> 
>>>>> thanks，willem
>>>>> 
>>>>> we are very appreciate.
>>>>> 
>>>>>> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
>>>>>> 
>>>>>> Hi,
>>>>>> 
>>>>>> I'm willing to be the Mentor.
>>>>>> Please count me in.
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Willem Jiang
>>>>>> 
>>>>>> Twitter: willemjiang
>>>>>> Weibo: 姜宁willem
>>>>>> 
>>>>>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi -
>>>>>>> 
>>>>>>> I’m willing to Champion and Mentor. I have a couple of comments
>>>>>>> inline.
>>>>>>> I’ll look at dependency licenses later today. It’s early for me.
>>>>>>> 
>>>>>>> 
>>>>>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>>>>>>>> 
>>>>>>>> Hi all,
>>>>>>>> 
>>>>>>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>>>>>> interactive SQL data warehousing).
>>>>>>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>>>>>> 
>>>>>>>> We propose to contribute Palo as an Apache Incubator project, and
>>>>>>>> we are still looking for possible Champion if anyone would like to
>>>>>>> volunteer. Thanks a lot.
>>>>>>>> 
>>>>>>>> Best Regards,
>>>>>>>> Reed
>>>>>>>> 
>>>>>>>> ===================
>>>>>>>> The draft of the proposal as below:
>>>>>>>> 
>>>>>>>> #Apache Palo
>>>>>>>> 
>>>>>>>> ##Abstract
>>>>>>>> 
>>>>>>>> Palo is a MPP-based interactive SQL data warehousing for reporting
>>>>>>>> and
>>>>>>> analysis.
>>>>>>>> 
>>>>>>>> ##Proposal
>>>>>>>> 
>>>>>>>> We propose to contribute the Palo codebase and associated artifacts
>>>>>>> (e.g. documentation, web-site content etc.) to the Apache Software
>>>>>>> Foundation with the intent of forming a productive, meritocratic and
>>>>>>> open
>>>>>>> community around Palo’s continued development, according to the
>>>>>>> ‘Apache
>>>>>>> Way’.
>>>>>>>> 
>>>>>>>> Baidu owns several trademarks regarding Palo, and proposes to
>>>>>>>> transfer
>>>>>>> ownership of those trademarks in full to the ASF.
>>>>>>>> 
>>>>>>>> ###Overview of Palo
>>>>>>>> 
>>>>>>>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>>>>>>> Backend
>>>>>>> (BE).
>>>>>>>> 
>>>>>>>> **Frontend daemon** consists of query coordinator and catalog
>>>>>>>> manager.
>>>>>>> Query coordinator is responsible for receiving users’ sql queries,
>>>>>>> compiling queries and managing queries execution. Catalog manager is
>>>>>>> responsible for managing metadata such as databases, tables,
>>>>>>> partitions,
>>>>>>> replicas and etc. Several frontend daemons could be deployed to
>>>>>>> guarantee
>>>>>>> fault-tolerance, and load balancing.
>>>>>>>> 
>>>>>>>> **Backend daemon** stores the data and executes the query fragments.
>>>>>>> Many backend daemons could also be deployed to provide scalability
>>>>>>> and
>>>>>>> fault-tolerance.
>>>>>>>> 
>>>>>>>> A typical Palo cluster generally composes of several frontend
>>>>>>>> daemons
>>>>>>> and dozens to hundreds of backend daemons.
>>>>>>>> 
>>>>>>>> Users can use MySQL client tools to connect any frontend daemon to
>>>>>>> submit SQL query. Frontend receives the query and compiles it into
>>>>>>> query
>>>>>>> plans executable by the Backend. Then Frontend sends the query plan
>>>>>>> fragments to Backend. Backend will build a query execution DAG. Data
>>>>>>> is
>>>>>>> fetched and pipelined into the DAG. The final result response is sent
>>>>>>> to
>>>>>>> client via Frontend. The distribution of query fragment execution
>>>>>>> takes
>>>>>>> minimizing data movement and maximizing scan locality as the main
>>>>>>> goal.
>>>>>>>> 
>>>>>>>> ##Background
>>>>>>>> 
>>>>>>>> At Baidu, Prior to Palo, different tools were deployed to solve
>>>>>>>> diverse
>>>>>>> requirements in many ways. And when a use case requires the
>>>>>>> simultaneous
>>>>>>> availability of capabilities that cannot all be provided by a single
>>>>>>> tool,
>>>>>>> users were forced to build hybrid architectures that stitch multiple
>>>>>>> tools
>>>>>>> together, but we believe that they shouldn’t need to accept such
>>>>>>> inherent
>>>>>>> complexity. A storage system built to provide great performance
>>>>>>> across a
>>>>>>> broad range of workloads provides a more elegant solution to the
>>>>>>> problems
>>>>>>> that hybrid architectures aim to solve. Palo is the solution.
>>>>>>>> 
>>>>>>>> Palo is designed to be a simple and single tightly coupled system,
>>>>>>>> not
>>>>>>> depending on other systems. Palo provides high concurrent low latency
>>>>>>> point
>>>>>>> query performance, but also provides high throughput queries of
>>>>>>> ad-hoc
>>>>>>> analysis. Palo provides bulk-batch data loading, but also provides
>>>>>>> near
>>>>>>> real-time mini-batch data loading. Palo also provides high
>>>>>>> availability,
>>>>>>> reliability, fault tolerance, and scalability.
>>>>>>>> 
>>>>>>>> ##Rationale
>>>>>>>> 
>>>>>>>> Palo mainly integrates the technology of Google Mesa and Apache
>>>>>>>> Impala.
>>>>>>>> 
>>>>>>>> Mesa is a highly scalable analytic data storage system that stores
>>>>>>> critical measurement data related to Google's Internet advertising
>>>>>>> business. Mesa is designed to satisfy complex and challenging set of
>>>>>>> users’
>>>>>>> and systems’ requirements, including near real-time data ingestion
>>>>>>> and
>>>>>>> query ability, as well as high availability, reliability, fault
>>>>>>> tolerance,
>>>>>>> and scalability for large data and query volumes.
>>>>>>>> 
>>>>>>>> Impala is a modern, open-source MPP SQL engine architected from the
>>>>>>> ground up for the Hadoop data processing environment. At present, by
>>>>>>> virtue
>>>>>>> of its superior performance and rich functionality， Impala has been
>>>>>>> comparable to many commercial MPP database query engine. Mesa can
>>>>>>> satisfy
>>>>>>> the needs of many of our storage requirements, however Mesa itself
>>>>>>> does not
>>>>>>> provide a SQL query engine; Impala is a very good MPP SQL query
>>>>>>> engine, but
>>>>>>> the lack of a perfect distributed storage engine. So in the end we
>>>>>>> chose
>>>>>>> the combination of these two technologies.
>>>>>>>> 
>>>>>>>> Learning from Mesa’s data model, we developed a distributed storage
>>>>>>> engine. Unlike Mesa, this storage engine does not rely on any
>>>>>>> distributed
>>>>>>> file system. Then we deeply integrate this storage engine with Impala
>>>>>>> query
>>>>>>> engine. Query compiling, query execution coordination and catalog
>>>>>>> management of storage engine are integrated to be frontend daemon;
>>>>>>> query
>>>>>>> execution and data storage are integrated to be backend daemon. With
>>>>>>> this
>>>>>>> integration, we implemented a single, full-featured, high performance
>>>>>>> state
>>>>>>> the art of MPP database, as well as maintaining the simplicity.
>>>>>>>> 
>>>>>>>> ##Current Status
>>>>>>>> 
>>>>>>>> Palo has been an open source project on GitHub (
>>>>>>> https://github.com/baidu/palo).
>>>>>>>> 
>>>>>>>> ###Meritocracy
>>>>>>>> 
>>>>>>>> Palo has been deployed in production at Baidu and is applying more
>>>>>>>> than
>>>>>>> 200 lines of business. It has demonstrated great performance benefits
>>>>>>> and
>>>>>>> has proved to be a better way for reporting and analysis based big
>>>>>>> data.
>>>>>>> Still We look forward to growing a rich user and developer community.
>>>>>>>> 
>>>>>>>> ###Community
>>>>>>>> 
>>>>>>>> Palo seeks to develop developer and user communities during
>>>>>>>> incubation.
>>>>>>>> 
>>>>>>>> ###Core Developers
>>>>>>>> 
>>>>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>>>> maruyue@baidu.com<mailto:maruy
>>>>>>> ue@baidu.com>)
>>>>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>> bu
>>>>>>> aa.zhaoc@gmail.com>)
>>>>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
>> ma
>>>>>>> iltolide@sina.com%EF%BC%89>
>>>>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>>>>> <ma...@baidu.com>)
>>>>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>>>> lichaoyong@baidu.com<mailto:
>>>>>>> lichaoyong@baidu.com>)
>>>>>>>> * Bin Lin (https://github.com/lingbin,
>>>>>>>> lingbinlb@gmail.com<mailto:lin
>>>>>>> gbinlb@gmail.com>)
>>>>>>>> 
>>>>>>>> ###Alignment
>>>>>>>> 
>>>>>>>> Palo is related to several other Apache projects:
>>>>>>>> 
>>>>>>>> * Palo can also read data stored in Apache Hadoop clusters powered
>>>>>>>> by
>>>>>>> the HDFS filesystem.
>>>>>>>> * Palo is closely integrated with Impala, which is also being
>>>>>>>> proposed
>>>>>>> to the Incubator.
>>>>>>> 
>>>>>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>>>>>> 
>>>>>>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>>>>>> choice.
>>>>>>>> 
>>>>>>>> ##Known Risks
>>>>>>>> 
>>>>>>>> ###Orphaned Products
>>>>>>>> 
>>>>>>>> The core developers of Palo team plan to work full time on this
>>>>>>>> project.
>>>>>>> There is very little risk of Palo getting orphaned since at least one
>>>>>>> large
>>>>>>> company (Baidu) is extensively using it in their production. For
>>>>>>> example,
>>>>>>> currently there are more than 200 use cases using Palo in production.
>>>>>>> Furthermore, since Palo was open sourced at the beginning of October
>>>>>>> 2017,
>>>>>>> it has received more than 660 stars and been forked nearly 170 times.
>>>>>>> We
>>>>>>> plan to extend and diversify this community further through Apache.
>>>>>>>> 
>>>>>>>> ###Inexperience with Open Source
>>>>>>>> 
>>>>>>>> The core developers are all active users and followers of open
>>>>>>>> source.
>>>>>>> They are already committers and contributors to the Palo Github
>>>>>>> project.
>>>>>>> All have been involved with the source code that has been released
>>>>>>> under an
>>>>>>> open source license, and several of them also have experience
>>>>>>> developing
>>>>>>> code in an open source environment. Though the core set of Developers
>>>>>>> do
>>>>>>> not have Apache Open Source experience, there are plans to onboard
>>>>>>> individuals with Apache open source experience on to the project.
>>>>>>>> 
>>>>>>>> ###Homogenous Developers
>>>>>>>> 
>>>>>>>> The most of core developers are from Baidu, but after Palo was open
>>>>>>> sourced, Palo received a lot of bug fixes and enhancements from other
>>>>>>> developers not working at Baidu.
>>>>>>>> 
>>>>>>>> ###Reliance on Salaried Developers
>>>>>>>> 
>>>>>>>> Baidu invested in Palo as the OLAP solution and some of its key
>>>>>>> engineers are working full time on the project. In addition, since
>>>>>>> there is
>>>>>>> a growing Big Data need for scalable OLAP solutions, we look forward
>>>>>>> to
>>>>>>> other Apache developers and researchers to contribute to the project.
>>>>>>> Also
>>>>>>> key to addressing the risk associated with relying on Salaried
>>>>>>> developers
>>>>>>> from a single entity is to increase the diversity of the contributors
>>>>>>> and
>>>>>>> actively lobby for Domain experts in the BI space to contribute.
>>>>>>> Apache
>>>>>>> Palo intends to do this.
>>>>>>>> 
>>>>>>>> ###An Excessive Fascination with the Apache Brand
>>>>>>>> 
>>>>>>>> Palo is proposing to enter incubation at Apache in order to help
>>>>>>>> efforts
>>>>>>> to diversify the committer-base, not so much to capitalize on the
>>>>>>> Apache
>>>>>>> brand. The Palo project is in production use already inside Baidu,
>>>>>>> but is
>>>>>>> not expected to be an Baidu product for external customers. As such,
>>>>>>> the
>>>>>>> Palo project is not seeking to use the Apache brand as a marketing
>>>>>>> tool.
>>>>>>>> 
>>>>>>>> ##Documentation
>>>>>>>> 
>>>>>>>> Information about Palo can be found at
>>>>>>>> https://github.com/baidu/palo.
>>>>>>> The following links provide more information about Palo in open
>>>>>>> source:
>>>>>>>> 
>>>>>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>>>>>>> * Codebase at Github: https://github.com/baidu/palo
>>>>>>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>>>>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>>>>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>>>>>>> 
>>>>>>>> ##Initial Source
>>>>>>>> 
>>>>>>>> Palo has been under development since 2017 by a team of engineers at
>>>>>>> Baidu Inc. It is currently hosted on Github.com under an Apache
>>>>>>> license at
>>>>>>> https://github.com/baidu/palo.
>>>>>>>> 
>>>>>>>> ##External Dependencies
>>>>>>>> 
>>>>>>>> Palo has the following external dependencies.
>>>>>>>> 
>>>>>>>> * Google gflags (BSD)
>>>>>>>> * Google glog (BSD)
>>>>>>>> * Apache Thrift (Apache Software License v2.0)
>>>>>>>> * Apache Commons (Apache Software License v2.0)
>>>>>>>> * Boost (Boost Software License)
>>>>>>>> * OpenLdap (OpenLDAP Software License)
>>>>>>>> * rapidjson (Tencent)
>>>>>>>> * Google RE2 (BSD-style)
>>>>>>>> * lz4 (BSD)
>>>>>>>> * snappy (BSD)
>>>>>>>> * cyrus-sasl (CMU License)
>>>>>>>> * Twitter Bootstrap (Apache Software License v2.0)
>>>>>>>> * d3 (BSD)
>>>>>>>> * LLVM (BSD-like)
>>>>>>>> 
>>>>>>>> Build and test dependencies:
>>>>>>>> 
>>>>>>>> * ant (Apache Software License v2.0)
>>>>>>>> * Apache Maven (Apache Software License v2.0)
>>>>>>>> * cmake (BSD)
>>>>>>>> * clang (BSD)
>>>>>>>> * Google gtest (Apache Software License v2.0)
>>>>>>>> 
>>>>>>>> ##Required Resources
>>>>>>>> 
>>>>>>>> ###Mailing List
>>>>>>>> 
>>>>>>>> There are currently no mailing lists. The usual mailing lists are
>>>>>>> expected to be set up when entering incubation:
>>>>>>>> 
>>>>>>>> private@palo.incubator.apache.org<mailto:private@palo.
>>>>>>> incubator.apache.org>
>>>>>>>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>>>>>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>>>>>>> incubator.apache.org>
>>>>>>>> 
>>>>>>>> ###Subversion Directory
>>>>>>>> 
>>>>>>>> Upon entering incubation: https://github.com/baidu/palo.
>>>>>>>> After incubation, we want to move the existing repo from
>>>>>>> https://github.com/baidu/palo to Apache infrastructure.
>>>>>>>> 
>>>>>>>> ###Issue Tracking
>>>>>>>> 
>>>>>>>> Palo currently uses GitHub to track issues. Would like to continue
>>>>>>>> to do
>>>>>>> so while we discuss migration possibilities with the ASF Infra
>>>>>>> committee.
>>>>>>>> 
>>>>>>>> ###Other Resources
>>>>>>>> 
>>>>>>>> The existing code already has unit tests so we will make use of
>>>>>>>> existing
>>>>>>> Apache continuous testing infrastructure. The resulting load should
>>>>>>> not be
>>>>>>> very large.
>>>>>>>> 
>>>>>>>> ##Initial Committers
>>>>>>>> 
>>>>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>>>> maruyue@baidu.com<mailto:maruy
>>>>>>> ue@baidu.com>)
>>>>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
>> bu
>>>>>>> aa.zhaoc@gmail.com>)
>>>>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
>> ma
>>>>>>> iltolide@sina.com%EF%BC%89>
>>>>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>>>>> <ma...@baidu.com>)
>>>>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>>>> lichaoyong@baidu.com<mailto:
>>>>>>> lichaoyong@baidu.com>)
>>>>>>>> * Bin Lin (https://github.com/lingbin,
>>>>>>>> lingbinlb@gmail.com<mailto:lin
>>>>>>> gbinlb@gmail.com>)
>>>>>>>> 
>>>>>>>> ##Affiliations
>>>>>>>> 
>>>>>>>> The initial committers are employees of Baidu Inc.. The nominated
>>>>>>> mentors are employees of TODO.
>>>>>>>> 
>>>>>>>> ##Sponsors
>>>>>>>> 
>>>>>>>> ###Champion
>>>>>>>> 
>>>>>>>> TODO
>>>>>>>> 
>>>>>>>> ###Nominated Mentors
>>>>>>>> 
>>>>>>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>>>>>>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>>>>>>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>>>>>>> 
>>>>>>> Mentors must be members of the IPMC and almost always Members of the
>>>>>>> ASF.
>>>>>>> 
>>>>>>> At this moment only Luke Han is qualified.
>>>>>>> 
>>>>>>> Regards,
>>>>>>> Dave
>>>>>>> 
>>>>>>>> 
>>>>>>>> ###Sponsoring Entity
>>>>>>>> 
>>>>>>>> We are requesting the Incubator to sponsor this project.
>>>>>>> 
>>>>>>> 
>>>>> ?B婯
>>>>> KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>> KKKKKKKCB??[
>>>>> 溳
>>>>> X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
>>> 圹[X[???K[XZ[??賉橽榌
>>>>> Z?[???[樰X榏?軏榎?X?K涇櫭B
>>>> 
>>> 
>>> ?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>> KKKKKKKKCB�
>>> ?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
>> ܙ�B��܈?Y??]?[ۘ[?
>>> ?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
>> 
>>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Jim,

Do you have any questions or suggestions about this Roadmap, please feel
free to let us know.

Best Regards,
Reed 

On 2018/6/21 下午8:06， "Li,De(BDG)" <li...@baidu.com> wrote:

>We have a general Plan or Roadmap:
>
>1. Find out the code, modules, components which duplicates of Impala;
>2. Determine the features which could be merged to Impala under Impala
>community support.
>3, Define clearly the interface between the query engine and other
>components, such as the storage engine, metadata management, mysql server,
>load and export modules, web server and so on.
>4. Separate the Impala query engine from Palo.
>
>
>Best Regards,
>Reed
>
>On 2018/6/19 下午9:48， "Jim Apple" <jb...@cloudera.com.INVALID> wrote:
>
>>>
>>> I'm not sure if Palo is just a storage system but definitely we will
>>> separate query engine from Palo.
>>>
>>
>>That's great news, and I think it will benefit users of Impala and Palo.
>>
>>
>>> Of cource, as you mentioned, "this could be a lot of work", so it will
>>> take a long time and we also hope that Impala community could support
>>>us.
>>>
>>
>>Yes, I expect so.
>
>?B婯
>KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[溳
>X溫軞X橩??K[XZ[??賉橽榌?][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
圹[X[???K[XZ[??賉橽榌
>?Z?[???[樰X榏?軏榎?X?K涇櫭B

Re: Looking for Champion

Posted by Jim Apple <jb...@cloudera.com.INVALID>.

Looks great!

On Thu, Jun 21, 2018 at 5:06 AM Li,De(BDG) <li...@baidu.com> wrote:

> We have a general Plan or Roadmap:
>
> 1. Find out the code, modules, components which duplicates of Impala;
> 2. Determine the features which could be merged to Impala under Impala
> community support.
> 3, Define clearly the interface between the query engine and other
> components, such as the storage engine, metadata management, mysql server,
> load and export modules, web server and so on.
> 4. Separate the Impala query engine from Palo.
>
>
> Best Regards,
> Reed
>
> On 2018/6/19 下午9:48， "Jim Apple" <jb...@cloudera.com.INVALID> wrote:
>
> >>
> >> I'm not sure if Palo is just a storage system but definitely we will
> >> separate query engine from Palo.
> >>
> >
> >That's great news, and I think it will benefit users of Impala and Palo.
> >
> >
> >> Of cource, as you mentioned, "this could be a lot of work", so it will
> >> take a long time and we also hope that Impala community could support
> >>us.
> >>
> >
> >Yes, I expect so.
>
>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

We have a general Plan or Roadmap:

1. Find out the code, modules, components which duplicates of Impala;
2. Determine the features which could be merged to Impala under Impala
community support.
3, Define clearly the interface between the query engine and other
components, such as the storage engine, metadata management, mysql server,
load and export modules, web server and so on.
4. Separate the Impala query engine from Palo.

Best Regards,
Reed

On 2018/6/19 下午9:48， "Jim Apple" <jb...@cloudera.com.INVALID> wrote:

>>
>> I'm not sure if Palo is just a storage system but definitely we will
>> separate query engine from Palo.
>>
>
>That's great news, and I think it will benefit users of Impala and Palo.
>
>
>> Of cource, as you mentioned, "this could be a lot of work", so it will
>> take a long time and we also hope that Impala community could support
>>us.
>>
>
>Yes, I expect so.

Re: Looking for Champion

Posted by Jim Apple <jb...@cloudera.com.INVALID>.

>
> I'm not sure if Palo is just a storage system but definitely we will
> separate query engine from Palo.
>

That's great news, and I think it will benefit users of Impala and Palo.


> Of cource, as you mentioned, "this could be a lot of work", so it will
> take a long time and we also hope that Impala community could support us.
>

Yes, I expect so.

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Jim,

Thank you for your response.

We have reflected upon your suggestion and we agree with you for the most
part.
We will try to find out or define a cleanly interface between Palo and
Impala and to determine which parts should keep in Palo and which parts
should as patches for Impala.
I'm not sure if Palo is just a storage system but definitely we will
separate query engine from Palo.
Of cource, as you mentioned, "this could be a lot of work", so it will
take a long time and we also hope that Impala community could support us.

For Tim's concern, as I said in another email, what we have done ever
maybe not in Apache way but I think it is a good opportuniy to us to
participate in open source community and learn to do things in Apache way,
including how to corporate with Impala community (indeed we lack of
interaction).

Best Regards,
Reed



在 2018/6/15 上午2:03， "Jim Apple" <jb...@cloudera.com.INVALID> 写入:

>I don't want to be a stickler, but I don't think "For issues mentioned by
>Jim, Todd and Tim, I have replied on last Saturday."
>
>To my email about Palo being an ASF project as a storage system without a
>query engine, you replied only, "We will seriously consider this
>proposal."
>
>I see no response to Tim's concern that "The code isn't owned by any
>individual, I contributed it to Apache and it's
>free for anyone to do what they want to do with it, but pulling in
>improvements from other projects without any attempt to attribute it or
>contribute improvements back seems contrary to the Apache way."
>
>On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com> wrote:
>
>> Hi all,
>>
>> About Palo, we have fixed following issues.
>>
>> 1. Related Impala
>> For issues mentioned by Jim, Todd and Tim, I have replied on last
>>Saturday.
>>
>> 2、Lisence issue
>> For issues mentioned by Todd and Ted.
>> 1) be/aes/* come from mysql-5.6, GPL v2.1 license
>> Fixed: removed aes related codes.
>> https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
>> 180b30bf
>> b7
>> https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
>> 77698f1c
>> ed
>>
>> 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
>> Fixed: removed mysql_dtoa related codes.
>> https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
>> 75b1f841
>> a1
>>
>> 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
>> Fixed: restored to original lisence, we are searching another http
>>server
>> to replace it.
>> https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
>> f59f04a8
>> 31
>>
>> 4) be/rpc/*
>> Fixed: We have replaced it with brpc, and we will remove Hypertable
>>after
>> few weeks for waiting users' upgrade to brpc.
>> https://github.com/baidu/palo/tree/master/be/src/rpc
>>
>> 3、Dependency licenses
>> For issue mentioned by Dave, It looks like that Palo have not depend on
>> OpenLdap and cyrus-sasl directly,
>> but some thirdpary libraries need them to compile, libcurl and
>>gperftools
>> for instance.
>> For rapidjson, we are looking for alternative one.
>>
>> 4、About the name of Palo
>> For issue mentioned by Julian.
>> We are figuring out a better one.
>>
>> Best Regards,
>> Reed
>>
>>
>>
>> 在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com> 写入:
>>
>> >Hi Julian,
>> >
>> >Thank you.
>> >
>> >It looks like that we have to find another one.
>> >If anyone has a good name, please feel free to let me know.
>> >
>> >Best Regards,
>> >Reed
>> >
>> >在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:
>> >
>> >>Note that there is an existing database product called Palo - an open
>> >>source OLAP engine by German company Jedox[1]. There there is a high
>> >>likelihood that Palo would have to change its name during incubation,
>>if
>> >>accepted.
>> >>
>> >>Julian
>> >>
>> >>[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
>> >><https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>> >>
>> >>
>> >>
>> >>> On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
>> >>>
>> >>> Cool Dave, it’s great to have you to be the campaign.
>> >>>
>> >>>
>> >>> ________________________________
>> >>> From: Tan,Zhongyi <tanzhongyi@baidu.com
>><ma...@baidu.com>>
>> >>> Sent: Saturday, June 9, 2018 8:16:28 AM
>> >>> To: general@incubator.apache.org
>><ma...@incubator.apache.org>
>> >>> Subject: Re: Looking for Champion
>> >>>
>> >>> thanks，willem
>> >>>
>> >>> we are very appreciate.
>> >>>
>> >>>> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
>> >>>>
>> >>>> Hi,
>> >>>>
>> >>>> I'm willing to be the Mentor.
>> >>>> Please count me in.
>> >>>>
>> >>>>
>> >>>>
>> >>>> Willem Jiang
>> >>>>
>> >>>> Twitter: willemjiang
>> >>>> Weibo: 姜宁willem
>> >>>>
>> >>>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher
>><da...@comcast.net>
>> >>>>>wrote:
>> >>>>>
>> >>>>> Hi -
>> >>>>>
>> >>>>> I’m willing to Champion and Mentor. I have a couple of comments
>> >>>>>inline.
>> >>>>> I’ll look at dependency licenses later today. It’s early for me.
>> >>>>>
>> >>>>>
>> >>>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>> >>>>>>
>> >>>>>> Hi all,
>> >>>>>>
>> >>>>>> I am Reed, as a developer worked with the team for Palo (a
>>MPP-based
>> >>>>> interactive SQL data warehousing).
>> >>>>>> https://github.com/baidu/palo/wiki/Palo-Overview
>> >>>>>>
>> >>>>>> We propose to contribute Palo as an Apache Incubator project, and
>> >>>>>> we are still looking for possible Champion if anyone would like
>>to
>> >>>>> volunteer. Thanks a lot.
>> >>>>>>
>> >>>>>> Best Regards,
>> >>>>>> Reed
>> >>>>>>
>> >>>>>> ===================
>> >>>>>> The draft of the proposal as below:
>> >>>>>>
>> >>>>>> #Apache Palo
>> >>>>>>
>> >>>>>> ##Abstract
>> >>>>>>
>> >>>>>> Palo is a MPP-based interactive SQL data warehousing for
>>reporting
>> >>>>>>and
>> >>>>> analysis.
>> >>>>>>
>> >>>>>> ##Proposal
>> >>>>>>
>> >>>>>> We propose to contribute the Palo codebase and associated
>>artifacts
>> >>>>> (e.g. documentation, web-site content etc.) to the Apache Software
>> >>>>> Foundation with the intent of forming a productive, meritocratic
>>and
>> >>>>>open
>> >>>>> community around Palo’s continued development, according to the
>> >>>>>‘Apache
>> >>>>> Way’.
>> >>>>>>
>> >>>>>> Baidu owns several trademarks regarding Palo, and proposes to
>> >>>>>>transfer
>> >>>>> ownership of those trademarks in full to the ASF.
>> >>>>>>
>> >>>>>> ###Overview of Palo
>> >>>>>>
>> >>>>>> Palo’s implementation consists of two daemons: Frontend (FE) and
>> >>>>>>Backend
>> >>>>> (BE).
>> >>>>>>
>> >>>>>> **Frontend daemon** consists of query coordinator and catalog
>> >>>>>>manager.
>> >>>>> Query coordinator is responsible for receiving users’ sql queries,
>> >>>>> compiling queries and managing queries execution. Catalog manager
>>is
>> >>>>> responsible for managing metadata such as databases, tables,
>> >>>>>partitions,
>> >>>>> replicas and etc. Several frontend daemons could be deployed to
>> >>>>>guarantee
>> >>>>> fault-tolerance, and load balancing.
>> >>>>>>
>> >>>>>> **Backend daemon** stores the data and executes the query
>>fragments.
>> >>>>> Many backend daemons could also be deployed to provide scalability
>> >>>>>and
>> >>>>> fault-tolerance.
>> >>>>>>
>> >>>>>> A typical Palo cluster generally composes of several frontend
>> >>>>>>daemons
>> >>>>> and dozens to hundreds of backend daemons.
>> >>>>>>
>> >>>>>> Users can use MySQL client tools to connect any frontend daemon
>>to
>> >>>>> submit SQL query. Frontend receives the query and compiles it into
>> >>>>>query
>> >>>>> plans executable by the Backend. Then Frontend sends the query
>>plan
>> >>>>> fragments to Backend. Backend will build a query execution DAG.
>>Data
>> >>>>>is
>> >>>>> fetched and pipelined into the DAG. The final result response is
>>sent
>> >>>>>to
>> >>>>> client via Frontend. The distribution of query fragment execution
>> >>>>>takes
>> >>>>> minimizing data movement and maximizing scan locality as the main
>> >>>>>goal.
>> >>>>>>
>> >>>>>> ##Background
>> >>>>>>
>> >>>>>> At Baidu, Prior to Palo, different tools were deployed to solve
>> >>>>>>diverse
>> >>>>> requirements in many ways. And when a use case requires the
>> >>>>>simultaneous
>> >>>>> availability of capabilities that cannot all be provided by a
>>single
>> >>>>>tool,
>> >>>>> users were forced to build hybrid architectures that stitch
>>multiple
>> >>>>>tools
>> >>>>> together, but we believe that they shouldn’t need to accept such
>> >>>>>inherent
>> >>>>> complexity. A storage system built to provide great performance
>> >>>>>across a
>> >>>>> broad range of workloads provides a more elegant solution to the
>> >>>>>problems
>> >>>>> that hybrid architectures aim to solve. Palo is the solution.
>> >>>>>>
>> >>>>>> Palo is designed to be a simple and single tightly coupled
>>system,
>> >>>>>>not
>> >>>>> depending on other systems. Palo provides high concurrent low
>>latency
>> >>>>>point
>> >>>>> query performance, but also provides high throughput queries of
>> >>>>>ad-hoc
>> >>>>> analysis. Palo provides bulk-batch data loading, but also provides
>> >>>>>near
>> >>>>> real-time mini-batch data loading. Palo also provides high
>> >>>>>availability,
>> >>>>> reliability, fault tolerance, and scalability.
>> >>>>>>
>> >>>>>> ##Rationale
>> >>>>>>
>> >>>>>> Palo mainly integrates the technology of Google Mesa and Apache
>> >>>>>>Impala.
>> >>>>>>
>> >>>>>> Mesa is a highly scalable analytic data storage system that
>>stores
>> >>>>> critical measurement data related to Google's Internet advertising
>> >>>>> business. Mesa is designed to satisfy complex and challenging set
>>of
>> >>>>>users’
>> >>>>> and systems’ requirements, including near real-time data ingestion
>> >>>>>and
>> >>>>> query ability, as well as high availability, reliability, fault
>> >>>>>tolerance,
>> >>>>> and scalability for large data and query volumes.
>> >>>>>>
>> >>>>>> Impala is a modern, open-source MPP SQL engine architected from
>>the
>> >>>>> ground up for the Hadoop data processing environment. At present,
>>by
>> >>>>>virtue
>> >>>>> of its superior performance and rich functionality， Impala has
>>been
>> >>>>> comparable to many commercial MPP database query engine. Mesa can
>> >>>>>satisfy
>> >>>>> the needs of many of our storage requirements, however Mesa itself
>> >>>>>does not
>> >>>>> provide a SQL query engine; Impala is a very good MPP SQL query
>> >>>>>engine, but
>> >>>>> the lack of a perfect distributed storage engine. So in the end we
>> >>>>>chose
>> >>>>> the combination of these two technologies.
>> >>>>>>
>> >>>>>> Learning from Mesa’s data model, we developed a distributed
>>storage
>> >>>>> engine. Unlike Mesa, this storage engine does not rely on any
>> >>>>>distributed
>> >>>>> file system. Then we deeply integrate this storage engine with
>>Impala
>> >>>>>query
>> >>>>> engine. Query compiling, query execution coordination and catalog
>> >>>>> management of storage engine are integrated to be frontend daemon;
>> >>>>>query
>> >>>>> execution and data storage are integrated to be backend daemon.
>>With
>> >>>>>this
>> >>>>> integration, we implemented a single, full-featured, high
>>performance
>> >>>>>state
>> >>>>> the art of MPP database, as well as maintaining the simplicity.
>> >>>>>>
>> >>>>>> ##Current Status
>> >>>>>>
>> >>>>>> Palo has been an open source project on GitHub (
>> >>>>> https://github.com/baidu/palo).
>> >>>>>>
>> >>>>>> ###Meritocracy
>> >>>>>>
>> >>>>>> Palo has been deployed in production at Baidu and is applying
>>more
>> >>>>>>than
>> >>>>> 200 lines of business. It has demonstrated great performance
>>benefits
>> >>>>>and
>> >>>>> has proved to be a better way for reporting and analysis based big
>> >>>>>data.
>> >>>>> Still We look forward to growing a rich user and developer
>>community.
>> >>>>>>
>> >>>>>> ###Community
>> >>>>>>
>> >>>>>> Palo seeks to develop developer and user communities during
>> >>>>>>incubation.
>> >>>>>>
>> >>>>>> ###Core Developers
>> >>>>>>
>> >>>>>> * Ruyue Ma (https://github.com/maruyue,
>> >>>>>>maruyue@baidu.com<mailto:maruy
>> >>>>> ue@baidu.com>)
>> >>>>>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<mailto:
>> bu
>> >>>>> aa.zhaoc@gmail.com>)
>> >>>>>> * Mingyu Chen
>>(https://github.com/morningman,chenmingyu@baidu.com)
>> >>>>>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:
>> ma
>> >>>>> iltolide@sina.com%EF%BC%89>
>> >>>>>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com
>> >>>>> <ma...@baidu.com>)
>> >>>>>> * Chaoyong Li (https://github.com/cyongli,
>> >>>>>>lichaoyong@baidu.com<mailto:
>> >>>>> lichaoyong@baidu.com>)
>> >>>>>> * Bin Lin (https://github.com/lingbin,
>> >>>>>>lingbinlb@gmail.com<mailto:lin
>> >>>>> gbinlb@gmail.com>)
>> >>>>>>
>> >>>>>> ###Alignment
>> >>>>>>
>> >>>>>> Palo is related to several other Apache projects:
>> >>>>>>
>> >>>>>> * Palo can also read data stored in Apache Hadoop clusters
>>powered
>> >>>>>>by
>> >>>>> the HDFS filesystem.
>> >>>>>> * Palo is closely integrated with Impala, which is also being
>> >>>>>>proposed
>> >>>>> to the Incubator.
>> >>>>>
>> >>>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>> >>>>>
>> >>>>>> * Palo uses Apache Thrift as its RPC and serialization framework
>>of
>> >>>>> choice.
>> >>>>>>
>> >>>>>> ##Known Risks
>> >>>>>>
>> >>>>>> ###Orphaned Products
>> >>>>>>
>> >>>>>> The core developers of Palo team plan to work full time on this
>> >>>>>>project.
>> >>>>> There is very little risk of Palo getting orphaned since at least
>>one
>> >>>>>large
>> >>>>> company (Baidu) is extensively using it in their production. For
>> >>>>>example,
>> >>>>> currently there are more than 200 use cases using Palo in
>>production.
>> >>>>> Furthermore, since Palo was open sourced at the beginning of
>>October
>> >>>>>2017,
>> >>>>> it has received more than 660 stars and been forked nearly 170
>>times.
>> >>>>>We
>> >>>>> plan to extend and diversify this community further through
>>Apache.
>> >>>>>>
>> >>>>>> ###Inexperience with Open Source
>> >>>>>>
>> >>>>>> The core developers are all active users and followers of open
>> >>>>>>source.
>> >>>>> They are already committers and contributors to the Palo Github
>> >>>>>project.
>> >>>>> All have been involved with the source code that has been released
>> >>>>>under an
>> >>>>> open source license, and several of them also have experience
>> >>>>>developing
>> >>>>> code in an open source environment. Though the core set of
>>Developers
>> >>>>>do
>> >>>>> not have Apache Open Source experience, there are plans to onboard
>> >>>>> individuals with Apache open source experience on to the project.
>> >>>>>>
>> >>>>>> ###Homogenous Developers
>> >>>>>>
>> >>>>>> The most of core developers are from Baidu, but after Palo was
>>open
>> >>>>> sourced, Palo received a lot of bug fixes and enhancements from
>>other
>> >>>>> developers not working at Baidu.
>> >>>>>>
>> >>>>>> ###Reliance on Salaried Developers
>> >>>>>>
>> >>>>>> Baidu invested in Palo as the OLAP solution and some of its key
>> >>>>> engineers are working full time on the project. In addition, since
>> >>>>>there is
>> >>>>> a growing Big Data need for scalable OLAP solutions, we look
>>forward
>> >>>>>to
>> >>>>> other Apache developers and researchers to contribute to the
>>project.
>> >>>>>Also
>> >>>>> key to addressing the risk associated with relying on Salaried
>> >>>>>developers
>> >>>>> from a single entity is to increase the diversity of the
>>contributors
>> >>>>>and
>> >>>>> actively lobby for Domain experts in the BI space to contribute.
>> >>>>>Apache
>> >>>>> Palo intends to do this.
>> >>>>>>
>> >>>>>> ###An Excessive Fascination with the Apache Brand
>> >>>>>>
>> >>>>>> Palo is proposing to enter incubation at Apache in order to help
>> >>>>>>efforts
>> >>>>> to diversify the committer-base, not so much to capitalize on the
>> >>>>>Apache
>> >>>>> brand. The Palo project is in production use already inside Baidu,
>> >>>>>but is
>> >>>>> not expected to be an Baidu product for external customers. As
>>such,
>> >>>>>the
>> >>>>> Palo project is not seeking to use the Apache brand as a marketing
>> >>>>>tool.
>> >>>>>>
>> >>>>>> ##Documentation
>> >>>>>>
>> >>>>>> Information about Palo can be found at
>> >>>>>>https://github.com/baidu/palo.
>> >>>>> The following links provide more information about Palo in open
>> >>>>>source:
>> >>>>>>
>> >>>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>> >>>>>> * Codebase at Github: https://github.com/baidu/palo
>> >>>>>> * Issue Tracking: https://github.com/baidu/palo/issues
>> >>>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> >>>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> >>>>>>
>> >>>>>> ##Initial Source
>> >>>>>>
>> >>>>>> Palo has been under development since 2017 by a team of
>>engineers at
>> >>>>> Baidu Inc. It is currently hosted on Github.com under an Apache
>> >>>>>license at
>> >>>>> https://github.com/baidu/palo.
>> >>>>>>
>> >>>>>> ##External Dependencies
>> >>>>>>
>> >>>>>> Palo has the following external dependencies.
>> >>>>>>
>> >>>>>> * Google gflags (BSD)
>> >>>>>> * Google glog (BSD)
>> >>>>>> * Apache Thrift (Apache Software License v2.0)
>> >>>>>> * Apache Commons (Apache Software License v2.0)
>> >>>>>> * Boost (Boost Software License)
>> >>>>>> * OpenLdap (OpenLDAP Software License)
>> >>>>>> * rapidjson (Tencent)
>> >>>>>> * Google RE2 (BSD-style)
>> >>>>>> * lz4 (BSD)
>> >>>>>> * snappy (BSD)
>> >>>>>> * cyrus-sasl (CMU License)
>> >>>>>> * Twitter Bootstrap (Apache Software License v2.0)
>> >>>>>> * d3 (BSD)
>> >>>>>> * LLVM (BSD-like)
>> >>>>>>
>> >>>>>> Build and test dependencies:
>> >>>>>>
>> >>>>>> * ant (Apache Software License v2.0)
>> >>>>>> * Apache Maven (Apache Software License v2.0)
>> >>>>>> * cmake (BSD)
>> >>>>>> * clang (BSD)
>> >>>>>> * Google gtest (Apache Software License v2.0)
>> >>>>>>
>> >>>>>> ##Required Resources
>> >>>>>>
>> >>>>>> ###Mailing List
>> >>>>>>
>> >>>>>> There are currently no mailing lists. The usual mailing lists are
>> >>>>> expected to be set up when entering incubation:
>> >>>>>>
>> >>>>>> private@palo.incubator.apache.org<mailto:private@palo.
>> >>>>> incubator.apache.org>
>> >>>>>> 
>>dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> >>>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>> >>>>> incubator.apache.org>
>> >>>>>>
>> >>>>>> ###Subversion Directory
>> >>>>>>
>> >>>>>> Upon entering incubation: https://github.com/baidu/palo.
>> >>>>>> After incubation, we want to move the existing repo from
>> >>>>> https://github.com/baidu/palo to Apache infrastructure.
>> >>>>>>
>> >>>>>> ###Issue Tracking
>> >>>>>>
>> >>>>>> Palo currently uses GitHub to track issues. Would like to
>>continue
>> >>>>>>to do
>> >>>>> so while we discuss migration possibilities with the ASF Infra
>> >>>>>committee.
>> >>>>>>
>> >>>>>> ###Other Resources
>> >>>>>>
>> >>>>>> The existing code already has unit tests so we will make use of
>> >>>>>>existing
>> >>>>> Apache continuous testing infrastructure. The resulting load
>>should
>> >>>>>not be
>> >>>>> very large.
>> >>>>>>
>> >>>>>> ##Initial Committers
>> >>>>>>
>> >>>>>> * Ruyue Ma (https://github.com/maruyue,
>> >>>>>>maruyue@baidu.com<mailto:maruy
>> >>>>> ue@baidu.com>)
>> >>>>>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<mailto:
>> bu
>> >>>>> aa.zhaoc@gmail.com>)
>> >>>>>> * Mingyu Chen
>>(https://github.com/morningman,chenmingyu@baidu.com)
>> >>>>>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:
>> ma
>> >>>>> iltolide@sina.com%EF%BC%89>
>> >>>>>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com
>> >>>>> <ma...@baidu.com>)
>> >>>>>> * Chaoyong Li (https://github.com/cyongli,
>> >>>>>>lichaoyong@baidu.com<mailto:
>> >>>>> lichaoyong@baidu.com>)
>> >>>>>> * Bin Lin (https://github.com/lingbin,
>> >>>>>>lingbinlb@gmail.com<mailto:lin
>> >>>>> gbinlb@gmail.com>)
>> >>>>>>
>> >>>>>> ##Affiliations
>> >>>>>>
>> >>>>>> The initial committers are employees of Baidu Inc.. The nominated
>> >>>>> mentors are employees of TODO.
>> >>>>>>
>> >>>>>> ##Sponsors
>> >>>>>>
>> >>>>>> ###Champion
>> >>>>>>
>> >>>>>> TODO
>> >>>>>>
>> >>>>>> ###Nominated Mentors
>> >>>>>>
>> >>>>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> >>>>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>> >>>>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>> >>>>>
>> >>>>> Mentors must be members of the IPMC and almost always Members of
>>the
>> >>>>>ASF.
>> >>>>>
>> >>>>> At this moment only Luke Han is qualified.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Dave
>> >>>>>
>> >>>>>>
>> >>>>>> ###Sponsoring Entity
>> >>>>>>
>> >>>>>> We are requesting the Incubator to sponsor this project.
>> >>>>>
>> >>>>>
>> >>> ?B婯
>> >>>KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>> KKKKKKKCB??[
>> >>>溳
>> >>>X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
>> >圹[X[???K[XZ[??賉橽榌
>> >>>Z?[???[樰X榏?軏榎?X?K涇櫭B
>> >>
>> >
>> >?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
>> KKKKKKKKCB�
>> >?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
>> ܙ�B��܈?Y??]?[ۘ[?
>> >?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
>>
>>

Re: Looking for Champion

Posted by Jim Apple <jb...@cloudera.com.INVALID>.

I don't want to be a stickler, but I don't think "For issues mentioned by
Jim, Todd and Tim, I have replied on last Saturday."

To my email about Palo being an ASF project as a storage system without a
query engine, you replied only, "We will seriously consider this proposal."

I see no response to Tim's concern that "The code isn't owned by any
individual, I contributed it to Apache and it's
free for anyone to do what they want to do with it, but pulling in
improvements from other projects without any attempt to attribute it or
contribute improvements back seems contrary to the Apache way."

On Thu, Jun 14, 2018 at 12:48 AM, Li,De(BDG) <li...@baidu.com> wrote:

> Hi all,
>
> About Palo, we have fixed following issues.
>
> 1. Related Impala
> For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.
>
> 2、Lisence issue
> For issues mentioned by Todd and Ted.
> 1) be/aes/* come from mysql-5.6, GPL v2.1 license
> Fixed: removed aes related codes.
> https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4
> 180b30bf
> b7
> https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf65440
> 77698f1c
> ed
>
> 2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
> Fixed: removed mysql_dtoa related codes.
> https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c9415509
> 75b1f841
> a1
>
> 3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
> Fixed: restored to original lisence, we are searching another http server
> to replace it.
> https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50
> f59f04a8
> 31
>
> 4) be/rpc/*
> Fixed: We have replaced it with brpc, and we will remove Hypertable after
> few weeks for waiting users' upgrade to brpc.
> https://github.com/baidu/palo/tree/master/be/src/rpc
>
> 3、Dependency licenses
> For issue mentioned by Dave, It looks like that Palo have not depend on
> OpenLdap and cyrus-sasl directly,
> but some thirdpary libraries need them to compile, libcurl and gperftools
> for instance.
> For rapidjson, we are looking for alternative one.
>
> 4、About the name of Palo
> For issue mentioned by Julian.
> We are figuring out a better one.
>
> Best Regards,
> Reed
>
>
>
> 在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com> 写入:
>
> >Hi Julian,
> >
> >Thank you.
> >
> >It looks like that we have to find another one.
> >If anyone has a good name, please feel free to let me know.
> >
> >Best Regards,
> >Reed
> >
> >在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:
> >
> >>Note that there is an existing database product called Palo - an open
> >>source OLAP engine by German company Jedox[1]. There there is a high
> >>likelihood that Palo would have to change its name during incubation, if
> >>accepted.
> >>
> >>Julian
> >>
> >>[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
> >><https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
> >>
> >>
> >>
> >>> On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
> >>>
> >>> Cool Dave, it’s great to have you to be the campaign.
> >>>
> >>>
> >>> ________________________________
> >>> From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
> >>> Sent: Saturday, June 9, 2018 8:16:28 AM
> >>> To: general@incubator.apache.org <ma...@incubator.apache.org>
> >>> Subject: Re: Looking for Champion
> >>>
> >>> thanks，willem
> >>>
> >>> we are very appreciate.
> >>>
> >>>> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
> >>>>
> >>>> Hi,
> >>>>
> >>>> I'm willing to be the Mentor.
> >>>> Please count me in.
> >>>>
> >>>>
> >>>>
> >>>> Willem Jiang
> >>>>
> >>>> Twitter: willemjiang
> >>>> Weibo: 姜宁willem
> >>>>
> >>>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>
> >>>>>wrote:
> >>>>>
> >>>>> Hi -
> >>>>>
> >>>>> I’m willing to Champion and Mentor. I have a couple of comments
> >>>>>inline.
> >>>>> I’ll look at dependency licenses later today. It’s early for me.
> >>>>>
> >>>>>
> >>>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
> >>>>>>
> >>>>>> Hi all,
> >>>>>>
> >>>>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
> >>>>> interactive SQL data warehousing).
> >>>>>> https://github.com/baidu/palo/wiki/Palo-Overview
> >>>>>>
> >>>>>> We propose to contribute Palo as an Apache Incubator project, and
> >>>>>> we are still looking for possible Champion if anyone would like to
> >>>>> volunteer. Thanks a lot.
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Reed
> >>>>>>
> >>>>>> ===================
> >>>>>> The draft of the proposal as below:
> >>>>>>
> >>>>>> #Apache Palo
> >>>>>>
> >>>>>> ##Abstract
> >>>>>>
> >>>>>> Palo is a MPP-based interactive SQL data warehousing for reporting
> >>>>>>and
> >>>>> analysis.
> >>>>>>
> >>>>>> ##Proposal
> >>>>>>
> >>>>>> We propose to contribute the Palo codebase and associated artifacts
> >>>>> (e.g. documentation, web-site content etc.) to the Apache Software
> >>>>> Foundation with the intent of forming a productive, meritocratic and
> >>>>>open
> >>>>> community around Palo’s continued development, according to the
> >>>>>‘Apache
> >>>>> Way’.
> >>>>>>
> >>>>>> Baidu owns several trademarks regarding Palo, and proposes to
> >>>>>>transfer
> >>>>> ownership of those trademarks in full to the ASF.
> >>>>>>
> >>>>>> ###Overview of Palo
> >>>>>>
> >>>>>> Palo’s implementation consists of two daemons: Frontend (FE) and
> >>>>>>Backend
> >>>>> (BE).
> >>>>>>
> >>>>>> **Frontend daemon** consists of query coordinator and catalog
> >>>>>>manager.
> >>>>> Query coordinator is responsible for receiving users’ sql queries,
> >>>>> compiling queries and managing queries execution. Catalog manager is
> >>>>> responsible for managing metadata such as databases, tables,
> >>>>>partitions,
> >>>>> replicas and etc. Several frontend daemons could be deployed to
> >>>>>guarantee
> >>>>> fault-tolerance, and load balancing.
> >>>>>>
> >>>>>> **Backend daemon** stores the data and executes the query fragments.
> >>>>> Many backend daemons could also be deployed to provide scalability
> >>>>>and
> >>>>> fault-tolerance.
> >>>>>>
> >>>>>> A typical Palo cluster generally composes of several frontend
> >>>>>>daemons
> >>>>> and dozens to hundreds of backend daemons.
> >>>>>>
> >>>>>> Users can use MySQL client tools to connect any frontend daemon to
> >>>>> submit SQL query. Frontend receives the query and compiles it into
> >>>>>query
> >>>>> plans executable by the Backend. Then Frontend sends the query plan
> >>>>> fragments to Backend. Backend will build a query execution DAG. Data
> >>>>>is
> >>>>> fetched and pipelined into the DAG. The final result response is sent
> >>>>>to
> >>>>> client via Frontend. The distribution of query fragment execution
> >>>>>takes
> >>>>> minimizing data movement and maximizing scan locality as the main
> >>>>>goal.
> >>>>>>
> >>>>>> ##Background
> >>>>>>
> >>>>>> At Baidu, Prior to Palo, different tools were deployed to solve
> >>>>>>diverse
> >>>>> requirements in many ways. And when a use case requires the
> >>>>>simultaneous
> >>>>> availability of capabilities that cannot all be provided by a single
> >>>>>tool,
> >>>>> users were forced to build hybrid architectures that stitch multiple
> >>>>>tools
> >>>>> together, but we believe that they shouldn’t need to accept such
> >>>>>inherent
> >>>>> complexity. A storage system built to provide great performance
> >>>>>across a
> >>>>> broad range of workloads provides a more elegant solution to the
> >>>>>problems
> >>>>> that hybrid architectures aim to solve. Palo is the solution.
> >>>>>>
> >>>>>> Palo is designed to be a simple and single tightly coupled system,
> >>>>>>not
> >>>>> depending on other systems. Palo provides high concurrent low latency
> >>>>>point
> >>>>> query performance, but also provides high throughput queries of
> >>>>>ad-hoc
> >>>>> analysis. Palo provides bulk-batch data loading, but also provides
> >>>>>near
> >>>>> real-time mini-batch data loading. Palo also provides high
> >>>>>availability,
> >>>>> reliability, fault tolerance, and scalability.
> >>>>>>
> >>>>>> ##Rationale
> >>>>>>
> >>>>>> Palo mainly integrates the technology of Google Mesa and Apache
> >>>>>>Impala.
> >>>>>>
> >>>>>> Mesa is a highly scalable analytic data storage system that stores
> >>>>> critical measurement data related to Google's Internet advertising
> >>>>> business. Mesa is designed to satisfy complex and challenging set of
> >>>>>users’
> >>>>> and systems’ requirements, including near real-time data ingestion
> >>>>>and
> >>>>> query ability, as well as high availability, reliability, fault
> >>>>>tolerance,
> >>>>> and scalability for large data and query volumes.
> >>>>>>
> >>>>>> Impala is a modern, open-source MPP SQL engine architected from the
> >>>>> ground up for the Hadoop data processing environment. At present, by
> >>>>>virtue
> >>>>> of its superior performance and rich functionality， Impala has been
> >>>>> comparable to many commercial MPP database query engine. Mesa can
> >>>>>satisfy
> >>>>> the needs of many of our storage requirements, however Mesa itself
> >>>>>does not
> >>>>> provide a SQL query engine; Impala is a very good MPP SQL query
> >>>>>engine, but
> >>>>> the lack of a perfect distributed storage engine. So in the end we
> >>>>>chose
> >>>>> the combination of these two technologies.
> >>>>>>
> >>>>>> Learning from Mesa’s data model, we developed a distributed storage
> >>>>> engine. Unlike Mesa, this storage engine does not rely on any
> >>>>>distributed
> >>>>> file system. Then we deeply integrate this storage engine with Impala
> >>>>>query
> >>>>> engine. Query compiling, query execution coordination and catalog
> >>>>> management of storage engine are integrated to be frontend daemon;
> >>>>>query
> >>>>> execution and data storage are integrated to be backend daemon. With
> >>>>>this
> >>>>> integration, we implemented a single, full-featured, high performance
> >>>>>state
> >>>>> the art of MPP database, as well as maintaining the simplicity.
> >>>>>>
> >>>>>> ##Current Status
> >>>>>>
> >>>>>> Palo has been an open source project on GitHub (
> >>>>> https://github.com/baidu/palo).
> >>>>>>
> >>>>>> ###Meritocracy
> >>>>>>
> >>>>>> Palo has been deployed in production at Baidu and is applying more
> >>>>>>than
> >>>>> 200 lines of business. It has demonstrated great performance benefits
> >>>>>and
> >>>>> has proved to be a better way for reporting and analysis based big
> >>>>>data.
> >>>>> Still We look forward to growing a rich user and developer community.
> >>>>>>
> >>>>>> ###Community
> >>>>>>
> >>>>>> Palo seeks to develop developer and user communities during
> >>>>>>incubation.
> >>>>>>
> >>>>>> ###Core Developers
> >>>>>>
> >>>>>> * Ruyue Ma (https://github.com/maruyue,
> >>>>>>maruyue@baidu.com<mailto:maruy
> >>>>> ue@baidu.com>)
> >>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
> bu
> >>>>> aa.zhaoc@gmail.com>)
> >>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> >>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
> ma
> >>>>> iltolide@sina.com%EF%BC%89>
> >>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> >>>>> <ma...@baidu.com>)
> >>>>>> * Chaoyong Li (https://github.com/cyongli,
> >>>>>>lichaoyong@baidu.com<mailto:
> >>>>> lichaoyong@baidu.com>)
> >>>>>> * Bin Lin (https://github.com/lingbin,
> >>>>>>lingbinlb@gmail.com<mailto:lin
> >>>>> gbinlb@gmail.com>)
> >>>>>>
> >>>>>> ###Alignment
> >>>>>>
> >>>>>> Palo is related to several other Apache projects:
> >>>>>>
> >>>>>> * Palo can also read data stored in Apache Hadoop clusters powered
> >>>>>>by
> >>>>> the HDFS filesystem.
> >>>>>> * Palo is closely integrated with Impala, which is also being
> >>>>>>proposed
> >>>>> to the Incubator.
> >>>>>
> >>>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
> >>>>>
> >>>>>> * Palo uses Apache Thrift as its RPC and serialization framework of
> >>>>> choice.
> >>>>>>
> >>>>>> ##Known Risks
> >>>>>>
> >>>>>> ###Orphaned Products
> >>>>>>
> >>>>>> The core developers of Palo team plan to work full time on this
> >>>>>>project.
> >>>>> There is very little risk of Palo getting orphaned since at least one
> >>>>>large
> >>>>> company (Baidu) is extensively using it in their production. For
> >>>>>example,
> >>>>> currently there are more than 200 use cases using Palo in production.
> >>>>> Furthermore, since Palo was open sourced at the beginning of October
> >>>>>2017,
> >>>>> it has received more than 660 stars and been forked nearly 170 times.
> >>>>>We
> >>>>> plan to extend and diversify this community further through Apache.
> >>>>>>
> >>>>>> ###Inexperience with Open Source
> >>>>>>
> >>>>>> The core developers are all active users and followers of open
> >>>>>>source.
> >>>>> They are already committers and contributors to the Palo Github
> >>>>>project.
> >>>>> All have been involved with the source code that has been released
> >>>>>under an
> >>>>> open source license, and several of them also have experience
> >>>>>developing
> >>>>> code in an open source environment. Though the core set of Developers
> >>>>>do
> >>>>> not have Apache Open Source experience, there are plans to onboard
> >>>>> individuals with Apache open source experience on to the project.
> >>>>>>
> >>>>>> ###Homogenous Developers
> >>>>>>
> >>>>>> The most of core developers are from Baidu, but after Palo was open
> >>>>> sourced, Palo received a lot of bug fixes and enhancements from other
> >>>>> developers not working at Baidu.
> >>>>>>
> >>>>>> ###Reliance on Salaried Developers
> >>>>>>
> >>>>>> Baidu invested in Palo as the OLAP solution and some of its key
> >>>>> engineers are working full time on the project. In addition, since
> >>>>>there is
> >>>>> a growing Big Data need for scalable OLAP solutions, we look forward
> >>>>>to
> >>>>> other Apache developers and researchers to contribute to the project.
> >>>>>Also
> >>>>> key to addressing the risk associated with relying on Salaried
> >>>>>developers
> >>>>> from a single entity is to increase the diversity of the contributors
> >>>>>and
> >>>>> actively lobby for Domain experts in the BI space to contribute.
> >>>>>Apache
> >>>>> Palo intends to do this.
> >>>>>>
> >>>>>> ###An Excessive Fascination with the Apache Brand
> >>>>>>
> >>>>>> Palo is proposing to enter incubation at Apache in order to help
> >>>>>>efforts
> >>>>> to diversify the committer-base, not so much to capitalize on the
> >>>>>Apache
> >>>>> brand. The Palo project is in production use already inside Baidu,
> >>>>>but is
> >>>>> not expected to be an Baidu product for external customers. As such,
> >>>>>the
> >>>>> Palo project is not seeking to use the Apache brand as a marketing
> >>>>>tool.
> >>>>>>
> >>>>>> ##Documentation
> >>>>>>
> >>>>>> Information about Palo can be found at
> >>>>>>https://github.com/baidu/palo.
> >>>>> The following links provide more information about Palo in open
> >>>>>source:
> >>>>>>
> >>>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
> >>>>>> * Codebase at Github: https://github.com/baidu/palo
> >>>>>> * Issue Tracking: https://github.com/baidu/palo/issues
> >>>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> >>>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> >>>>>>
> >>>>>> ##Initial Source
> >>>>>>
> >>>>>> Palo has been under development since 2017 by a team of engineers at
> >>>>> Baidu Inc. It is currently hosted on Github.com under an Apache
> >>>>>license at
> >>>>> https://github.com/baidu/palo.
> >>>>>>
> >>>>>> ##External Dependencies
> >>>>>>
> >>>>>> Palo has the following external dependencies.
> >>>>>>
> >>>>>> * Google gflags (BSD)
> >>>>>> * Google glog (BSD)
> >>>>>> * Apache Thrift (Apache Software License v2.0)
> >>>>>> * Apache Commons (Apache Software License v2.0)
> >>>>>> * Boost (Boost Software License)
> >>>>>> * OpenLdap (OpenLDAP Software License)
> >>>>>> * rapidjson (Tencent)
> >>>>>> * Google RE2 (BSD-style)
> >>>>>> * lz4 (BSD)
> >>>>>> * snappy (BSD)
> >>>>>> * cyrus-sasl (CMU License)
> >>>>>> * Twitter Bootstrap (Apache Software License v2.0)
> >>>>>> * d3 (BSD)
> >>>>>> * LLVM (BSD-like)
> >>>>>>
> >>>>>> Build and test dependencies:
> >>>>>>
> >>>>>> * ant (Apache Software License v2.0)
> >>>>>> * Apache Maven (Apache Software License v2.0)
> >>>>>> * cmake (BSD)
> >>>>>> * clang (BSD)
> >>>>>> * Google gtest (Apache Software License v2.0)
> >>>>>>
> >>>>>> ##Required Resources
> >>>>>>
> >>>>>> ###Mailing List
> >>>>>>
> >>>>>> There are currently no mailing lists. The usual mailing lists are
> >>>>> expected to be set up when entering incubation:
> >>>>>>
> >>>>>> private@palo.incubator.apache.org<mailto:private@palo.
> >>>>> incubator.apache.org>
> >>>>>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> >>>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
> >>>>> incubator.apache.org>
> >>>>>>
> >>>>>> ###Subversion Directory
> >>>>>>
> >>>>>> Upon entering incubation: https://github.com/baidu/palo.
> >>>>>> After incubation, we want to move the existing repo from
> >>>>> https://github.com/baidu/palo to Apache infrastructure.
> >>>>>>
> >>>>>> ###Issue Tracking
> >>>>>>
> >>>>>> Palo currently uses GitHub to track issues. Would like to continue
> >>>>>>to do
> >>>>> so while we discuss migration possibilities with the ASF Infra
> >>>>>committee.
> >>>>>>
> >>>>>> ###Other Resources
> >>>>>>
> >>>>>> The existing code already has unit tests so we will make use of
> >>>>>>existing
> >>>>> Apache continuous testing infrastructure. The resulting load should
> >>>>>not be
> >>>>> very large.
> >>>>>>
> >>>>>> ##Initial Committers
> >>>>>>
> >>>>>> * Ruyue Ma (https://github.com/maruyue,
> >>>>>>maruyue@baidu.com<mailto:maruy
> >>>>> ue@baidu.com>)
> >>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:
> bu
> >>>>> aa.zhaoc@gmail.com>)
> >>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> >>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:
> ma
> >>>>> iltolide@sina.com%EF%BC%89>
> >>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> >>>>> <ma...@baidu.com>)
> >>>>>> * Chaoyong Li (https://github.com/cyongli,
> >>>>>>lichaoyong@baidu.com<mailto:
> >>>>> lichaoyong@baidu.com>)
> >>>>>> * Bin Lin (https://github.com/lingbin,
> >>>>>>lingbinlb@gmail.com<mailto:lin
> >>>>> gbinlb@gmail.com>)
> >>>>>>
> >>>>>> ##Affiliations
> >>>>>>
> >>>>>> The initial committers are employees of Baidu Inc.. The nominated
> >>>>> mentors are employees of TODO.
> >>>>>>
> >>>>>> ##Sponsors
> >>>>>>
> >>>>>> ###Champion
> >>>>>>
> >>>>>> TODO
> >>>>>>
> >>>>>> ###Nominated Mentors
> >>>>>>
> >>>>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> >>>>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
> >>>>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
> >>>>>
> >>>>> Mentors must be members of the IPMC and almost always Members of the
> >>>>>ASF.
> >>>>>
> >>>>> At this moment only Luke Han is qualified.
> >>>>>
> >>>>> Regards,
> >>>>> Dave
> >>>>>
> >>>>>>
> >>>>>> ###Sponsoring Entity
> >>>>>>
> >>>>>> We are requesting the Incubator to sponsor this project.
> >>>>>
> >>>>>
> >>> ?B婯
> >>>KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
> KKKKKKKCB??[
> >>>溳
> >>>X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
> >圹[X[???K[XZ[??賉橽榌
> >>>Z?[???[樰X榏?軏榎?X?K涇櫭B
> >>
> >
> >?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKK
> KKKKKKKKCB�
> >?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�
> ܙ�B��܈?Y??]?[ۘ[?
> >?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B
>
>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi all,

About Palo, we have fixed following issues.

1. Related Impala
For issues mentioned by Jim, Todd and Tim, I have replied on last Saturday.

2、Lisence issue
For issues mentioned by Todd and Ted.
1) be/aes/* come from mysql-5.6, GPL v2.1 license
Fixed: removed aes related codes.
https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4180b30bf
b7
https://github.com/baidu/palo/commit/3c9f2ae6695ffebe41e39b6bf6544077698f1c
ed

2) be/util/mysql_dtoa.cpp copy from MySQL, GPL license
Fixed: removed mysql_dtoa related codes.
https://github.com/baidu/palo/commit/bfe1bc7cf39e165a7c52b2c941550975b1f841
a1

3) be/http/mongoose.h, Copyright (c) 2004-2012 Sergey Lyubka
Fixed: restored to original lisence, we are searching another http server
to replace it.
https://github.com/baidu/palo/commit/81baef34f48a2dbe7401712c5e0a50f59f04a8
31

4) be/rpc/*
Fixed: We have replaced it with brpc, and we will remove Hypertable after
few weeks for waiting users' upgrade to brpc.
https://github.com/baidu/palo/tree/master/be/src/rpc

3、Dependency licenses
For issue mentioned by Dave, It looks like that Palo have not depend on
OpenLdap and cyrus-sasl directly,
but some thirdpary libraries need them to compile, libcurl and gperftools
for instance.
For rapidjson, we are looking for alternative one.

4、About the name of Palo
For issue mentioned by Julian.
We are figuring out a better one.

Best Regards,
Reed



在 2018/6/13 上午8:54， "Li,De(BDG)" <li...@baidu.com> 写入:

>Hi Julian,
>
>Thank you.
>
>It looks like that we have to find another one.
>If anyone has a good name, please feel free to let me know.
>
>Best Regards,
>Reed
>
>在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:
>
>>Note that there is an existing database product called Palo - an open
>>source OLAP engine by German company Jedox[1]. There there is a high
>>likelihood that Palo would have to change its name during incubation, if
>>accepted.
>>
>>Julian
>>
>>[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
>><https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>>
>>
>>
>>> On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
>>> 
>>> Cool Dave, it’s great to have you to be the campaign.
>>> 
>>> 
>>> ________________________________
>>> From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
>>> Sent: Saturday, June 9, 2018 8:16:28 AM
>>> To: general@incubator.apache.org <ma...@incubator.apache.org>
>>> Subject: Re: Looking for Champion
>>> 
>>> thanks，willem
>>> 
>>> we are very appreciate.
>>> 
>>>> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
>>>> 
>>>> Hi,
>>>> 
>>>> I'm willing to be the Mentor.
>>>> Please count me in.
>>>> 
>>>> 
>>>> 
>>>> Willem Jiang
>>>> 
>>>> Twitter: willemjiang
>>>> Weibo: 姜宁willem
>>>> 
>>>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>
>>>>>wrote:
>>>>> 
>>>>> Hi -
>>>>> 
>>>>> I’m willing to Champion and Mentor. I have a couple of comments
>>>>>inline.
>>>>> I’ll look at dependency licenses later today. It’s early for me.
>>>>> 
>>>>> 
>>>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>>>> interactive SQL data warehousing).
>>>>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>>>> 
>>>>>> We propose to contribute Palo as an Apache Incubator project, and
>>>>>> we are still looking for possible Champion if anyone would like to
>>>>> volunteer. Thanks a lot.
>>>>>> 
>>>>>> Best Regards,
>>>>>> Reed
>>>>>> 
>>>>>> ===================
>>>>>> The draft of the proposal as below:
>>>>>> 
>>>>>> #Apache Palo
>>>>>> 
>>>>>> ##Abstract
>>>>>> 
>>>>>> Palo is a MPP-based interactive SQL data warehousing for reporting
>>>>>>and
>>>>> analysis.
>>>>>> 
>>>>>> ##Proposal
>>>>>> 
>>>>>> We propose to contribute the Palo codebase and associated artifacts
>>>>> (e.g. documentation, web-site content etc.) to the Apache Software
>>>>> Foundation with the intent of forming a productive, meritocratic and
>>>>>open
>>>>> community around Palo’s continued development, according to the
>>>>>‘Apache
>>>>> Way’.
>>>>>> 
>>>>>> Baidu owns several trademarks regarding Palo, and proposes to
>>>>>>transfer
>>>>> ownership of those trademarks in full to the ASF.
>>>>>> 
>>>>>> ###Overview of Palo
>>>>>> 
>>>>>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>>>>>Backend
>>>>> (BE).
>>>>>> 
>>>>>> **Frontend daemon** consists of query coordinator and catalog
>>>>>>manager.
>>>>> Query coordinator is responsible for receiving users’ sql queries,
>>>>> compiling queries and managing queries execution. Catalog manager is
>>>>> responsible for managing metadata such as databases, tables,
>>>>>partitions,
>>>>> replicas and etc. Several frontend daemons could be deployed to
>>>>>guarantee
>>>>> fault-tolerance, and load balancing.
>>>>>> 
>>>>>> **Backend daemon** stores the data and executes the query fragments.
>>>>> Many backend daemons could also be deployed to provide scalability
>>>>>and
>>>>> fault-tolerance.
>>>>>> 
>>>>>> A typical Palo cluster generally composes of several frontend
>>>>>>daemons
>>>>> and dozens to hundreds of backend daemons.
>>>>>> 
>>>>>> Users can use MySQL client tools to connect any frontend daemon to
>>>>> submit SQL query. Frontend receives the query and compiles it into
>>>>>query
>>>>> plans executable by the Backend. Then Frontend sends the query plan
>>>>> fragments to Backend. Backend will build a query execution DAG. Data
>>>>>is
>>>>> fetched and pipelined into the DAG. The final result response is sent
>>>>>to
>>>>> client via Frontend. The distribution of query fragment execution
>>>>>takes
>>>>> minimizing data movement and maximizing scan locality as the main
>>>>>goal.
>>>>>> 
>>>>>> ##Background
>>>>>> 
>>>>>> At Baidu, Prior to Palo, different tools were deployed to solve
>>>>>>diverse
>>>>> requirements in many ways. And when a use case requires the
>>>>>simultaneous
>>>>> availability of capabilities that cannot all be provided by a single
>>>>>tool,
>>>>> users were forced to build hybrid architectures that stitch multiple
>>>>>tools
>>>>> together, but we believe that they shouldn’t need to accept such
>>>>>inherent
>>>>> complexity. A storage system built to provide great performance
>>>>>across a
>>>>> broad range of workloads provides a more elegant solution to the
>>>>>problems
>>>>> that hybrid architectures aim to solve. Palo is the solution.
>>>>>> 
>>>>>> Palo is designed to be a simple and single tightly coupled system,
>>>>>>not
>>>>> depending on other systems. Palo provides high concurrent low latency
>>>>>point
>>>>> query performance, but also provides high throughput queries of
>>>>>ad-hoc
>>>>> analysis. Palo provides bulk-batch data loading, but also provides
>>>>>near
>>>>> real-time mini-batch data loading. Palo also provides high
>>>>>availability,
>>>>> reliability, fault tolerance, and scalability.
>>>>>> 
>>>>>> ##Rationale
>>>>>> 
>>>>>> Palo mainly integrates the technology of Google Mesa and Apache
>>>>>>Impala.
>>>>>> 
>>>>>> Mesa is a highly scalable analytic data storage system that stores
>>>>> critical measurement data related to Google's Internet advertising
>>>>> business. Mesa is designed to satisfy complex and challenging set of
>>>>>users’
>>>>> and systems’ requirements, including near real-time data ingestion
>>>>>and
>>>>> query ability, as well as high availability, reliability, fault
>>>>>tolerance,
>>>>> and scalability for large data and query volumes.
>>>>>> 
>>>>>> Impala is a modern, open-source MPP SQL engine architected from the
>>>>> ground up for the Hadoop data processing environment. At present, by
>>>>>virtue
>>>>> of its superior performance and rich functionality， Impala has been
>>>>> comparable to many commercial MPP database query engine. Mesa can
>>>>>satisfy
>>>>> the needs of many of our storage requirements, however Mesa itself
>>>>>does not
>>>>> provide a SQL query engine; Impala is a very good MPP SQL query
>>>>>engine, but
>>>>> the lack of a perfect distributed storage engine. So in the end we
>>>>>chose
>>>>> the combination of these two technologies.
>>>>>> 
>>>>>> Learning from Mesa’s data model, we developed a distributed storage
>>>>> engine. Unlike Mesa, this storage engine does not rely on any
>>>>>distributed
>>>>> file system. Then we deeply integrate this storage engine with Impala
>>>>>query
>>>>> engine. Query compiling, query execution coordination and catalog
>>>>> management of storage engine are integrated to be frontend daemon;
>>>>>query
>>>>> execution and data storage are integrated to be backend daemon. With
>>>>>this
>>>>> integration, we implemented a single, full-featured, high performance
>>>>>state
>>>>> the art of MPP database, as well as maintaining the simplicity.
>>>>>> 
>>>>>> ##Current Status
>>>>>> 
>>>>>> Palo has been an open source project on GitHub (
>>>>> https://github.com/baidu/palo).
>>>>>> 
>>>>>> ###Meritocracy
>>>>>> 
>>>>>> Palo has been deployed in production at Baidu and is applying more
>>>>>>than
>>>>> 200 lines of business. It has demonstrated great performance benefits
>>>>>and
>>>>> has proved to be a better way for reporting and analysis based big
>>>>>data.
>>>>> Still We look forward to growing a rich user and developer community.
>>>>>> 
>>>>>> ###Community
>>>>>> 
>>>>>> Palo seeks to develop developer and user communities during
>>>>>>incubation.
>>>>>> 
>>>>>> ###Core Developers
>>>>>> 
>>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>>maruyue@baidu.com<mailto:maruy
>>>>> ue@baidu.com>)
>>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>>>> aa.zhaoc@gmail.com>)
>>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>>>>> iltolide@sina.com%EF%BC%89>
>>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>>> <ma...@baidu.com>)
>>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>>lichaoyong@baidu.com<mailto:
>>>>> lichaoyong@baidu.com>)
>>>>>> * Bin Lin (https://github.com/lingbin,
>>>>>>lingbinlb@gmail.com<mailto:lin
>>>>> gbinlb@gmail.com>)
>>>>>> 
>>>>>> ###Alignment
>>>>>> 
>>>>>> Palo is related to several other Apache projects:
>>>>>> 
>>>>>> * Palo can also read data stored in Apache Hadoop clusters powered
>>>>>>by
>>>>> the HDFS filesystem.
>>>>>> * Palo is closely integrated with Impala, which is also being
>>>>>>proposed
>>>>> to the Incubator.
>>>>> 
>>>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>>>> 
>>>>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>>>> choice.
>>>>>> 
>>>>>> ##Known Risks
>>>>>> 
>>>>>> ###Orphaned Products
>>>>>> 
>>>>>> The core developers of Palo team plan to work full time on this
>>>>>>project.
>>>>> There is very little risk of Palo getting orphaned since at least one
>>>>>large
>>>>> company (Baidu) is extensively using it in their production. For
>>>>>example,
>>>>> currently there are more than 200 use cases using Palo in production.
>>>>> Furthermore, since Palo was open sourced at the beginning of October
>>>>>2017,
>>>>> it has received more than 660 stars and been forked nearly 170 times.
>>>>>We
>>>>> plan to extend and diversify this community further through Apache.
>>>>>> 
>>>>>> ###Inexperience with Open Source
>>>>>> 
>>>>>> The core developers are all active users and followers of open
>>>>>>source.
>>>>> They are already committers and contributors to the Palo Github
>>>>>project.
>>>>> All have been involved with the source code that has been released
>>>>>under an
>>>>> open source license, and several of them also have experience
>>>>>developing
>>>>> code in an open source environment. Though the core set of Developers
>>>>>do
>>>>> not have Apache Open Source experience, there are plans to onboard
>>>>> individuals with Apache open source experience on to the project.
>>>>>> 
>>>>>> ###Homogenous Developers
>>>>>> 
>>>>>> The most of core developers are from Baidu, but after Palo was open
>>>>> sourced, Palo received a lot of bug fixes and enhancements from other
>>>>> developers not working at Baidu.
>>>>>> 
>>>>>> ###Reliance on Salaried Developers
>>>>>> 
>>>>>> Baidu invested in Palo as the OLAP solution and some of its key
>>>>> engineers are working full time on the project. In addition, since
>>>>>there is
>>>>> a growing Big Data need for scalable OLAP solutions, we look forward
>>>>>to
>>>>> other Apache developers and researchers to contribute to the project.
>>>>>Also
>>>>> key to addressing the risk associated with relying on Salaried
>>>>>developers
>>>>> from a single entity is to increase the diversity of the contributors
>>>>>and
>>>>> actively lobby for Domain experts in the BI space to contribute.
>>>>>Apache
>>>>> Palo intends to do this.
>>>>>> 
>>>>>> ###An Excessive Fascination with the Apache Brand
>>>>>> 
>>>>>> Palo is proposing to enter incubation at Apache in order to help
>>>>>>efforts
>>>>> to diversify the committer-base, not so much to capitalize on the
>>>>>Apache
>>>>> brand. The Palo project is in production use already inside Baidu,
>>>>>but is
>>>>> not expected to be an Baidu product for external customers. As such,
>>>>>the
>>>>> Palo project is not seeking to use the Apache brand as a marketing
>>>>>tool.
>>>>>> 
>>>>>> ##Documentation
>>>>>> 
>>>>>> Information about Palo can be found at
>>>>>>https://github.com/baidu/palo.
>>>>> The following links provide more information about Palo in open
>>>>>source:
>>>>>> 
>>>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>>>>> * Codebase at Github: https://github.com/baidu/palo
>>>>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>>>>> 
>>>>>> ##Initial Source
>>>>>> 
>>>>>> Palo has been under development since 2017 by a team of engineers at
>>>>> Baidu Inc. It is currently hosted on Github.com under an Apache
>>>>>license at
>>>>> https://github.com/baidu/palo.
>>>>>> 
>>>>>> ##External Dependencies
>>>>>> 
>>>>>> Palo has the following external dependencies.
>>>>>> 
>>>>>> * Google gflags (BSD)
>>>>>> * Google glog (BSD)
>>>>>> * Apache Thrift (Apache Software License v2.0)
>>>>>> * Apache Commons (Apache Software License v2.0)
>>>>>> * Boost (Boost Software License)
>>>>>> * OpenLdap (OpenLDAP Software License)
>>>>>> * rapidjson (Tencent)
>>>>>> * Google RE2 (BSD-style)
>>>>>> * lz4 (BSD)
>>>>>> * snappy (BSD)
>>>>>> * cyrus-sasl (CMU License)
>>>>>> * Twitter Bootstrap (Apache Software License v2.0)
>>>>>> * d3 (BSD)
>>>>>> * LLVM (BSD-like)
>>>>>> 
>>>>>> Build and test dependencies:
>>>>>> 
>>>>>> * ant (Apache Software License v2.0)
>>>>>> * Apache Maven (Apache Software License v2.0)
>>>>>> * cmake (BSD)
>>>>>> * clang (BSD)
>>>>>> * Google gtest (Apache Software License v2.0)
>>>>>> 
>>>>>> ##Required Resources
>>>>>> 
>>>>>> ###Mailing List
>>>>>> 
>>>>>> There are currently no mailing lists. The usual mailing lists are
>>>>> expected to be set up when entering incubation:
>>>>>> 
>>>>>> private@palo.incubator.apache.org<mailto:private@palo.
>>>>> incubator.apache.org>
>>>>>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>>>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>>>>> incubator.apache.org>
>>>>>> 
>>>>>> ###Subversion Directory
>>>>>> 
>>>>>> Upon entering incubation: https://github.com/baidu/palo.
>>>>>> After incubation, we want to move the existing repo from
>>>>> https://github.com/baidu/palo to Apache infrastructure.
>>>>>> 
>>>>>> ###Issue Tracking
>>>>>> 
>>>>>> Palo currently uses GitHub to track issues. Would like to continue
>>>>>>to do
>>>>> so while we discuss migration possibilities with the ASF Infra
>>>>>committee.
>>>>>> 
>>>>>> ###Other Resources
>>>>>> 
>>>>>> The existing code already has unit tests so we will make use of
>>>>>>existing
>>>>> Apache continuous testing infrastructure. The resulting load should
>>>>>not be
>>>>> very large.
>>>>>> 
>>>>>> ##Initial Committers
>>>>>> 
>>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>>maruyue@baidu.com<mailto:maruy
>>>>> ue@baidu.com>)
>>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>>>> aa.zhaoc@gmail.com>)
>>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>>>>> iltolide@sina.com%EF%BC%89>
>>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>>> <ma...@baidu.com>)
>>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>>lichaoyong@baidu.com<mailto:
>>>>> lichaoyong@baidu.com>)
>>>>>> * Bin Lin (https://github.com/lingbin,
>>>>>>lingbinlb@gmail.com<mailto:lin
>>>>> gbinlb@gmail.com>)
>>>>>> 
>>>>>> ##Affiliations
>>>>>> 
>>>>>> The initial committers are employees of Baidu Inc.. The nominated
>>>>> mentors are employees of TODO.
>>>>>> 
>>>>>> ##Sponsors
>>>>>> 
>>>>>> ###Champion
>>>>>> 
>>>>>> TODO
>>>>>> 
>>>>>> ###Nominated Mentors
>>>>>> 
>>>>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>>>>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>>>>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>>>>> 
>>>>> Mentors must be members of the IPMC and almost always Members of the
>>>>>ASF.
>>>>> 
>>>>> At this moment only Luke Han is qualified.
>>>>> 
>>>>> Regards,
>>>>> Dave
>>>>> 
>>>>>> 
>>>>>> ###Sponsoring Entity
>>>>>> 
>>>>>> We are requesting the Incubator to sponsor this project.
>>>>> 
>>>>> 
>>> ?B婯
>>>KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[
>>>溳
>>>X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
>圹[X[???K[XZ[??賉橽榌 
>>>Z?[???[樰X榏?軏榎?X?K涇櫭B
>>
>
>?B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB�
>?�?[��X��ܚX�K??K[XZ[?�?�[�\�[?][��X��ܚX�P?[��X�]?܋�\?X�?K�ܙ�B��܈?Y??]?[ۘ[?
>?��[X[�?�??K[XZ[?�?�[�\�[?Z?[???[��X�]?܋�\?X�?K�ܙ�B

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Julian,

Thank you.

It looks like that we have to find another one.
If anyone has a good name, please feel free to let me know.

Best Regards,
Reed

在 2018/6/13 上午4:20， "Julian Hyde" <jh...@apache.org> 写入:

>Note that there is an existing database product called Palo - an open
>source OLAP engine by German company Jedox[1]. There there is a high
>likelihood that Palo would have to change its name during incubation, if
>accepted.
>
>Julian
>
>[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database)
><https://en.wikipedia.org/wiki/Palo_(OLAP_database)>
>
>
>
>> On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
>> 
>> Cool Dave, it’s great to have you to be the campaign.
>> 
>> 
>> ________________________________
>> From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
>> Sent: Saturday, June 9, 2018 8:16:28 AM
>> To: general@incubator.apache.org <ma...@incubator.apache.org>
>> Subject: Re: Looking for Champion
>> 
>> thanks，willem
>> 
>> we are very appreciate.
>> 
>>> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
>>> 
>>> Hi,
>>> 
>>> I'm willing to be the Mentor.
>>> Please count me in.
>>> 
>>> 
>>> 
>>> Willem Jiang
>>> 
>>> Twitter: willemjiang
>>> Weibo: 姜宁willem
>>> 
>>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net>
>>>>wrote:
>>>> 
>>>> Hi -
>>>> 
>>>> I’m willing to Champion and Mentor. I have a couple of comments
>>>>inline.
>>>> I’ll look at dependency licenses later today. It’s early for me.
>>>> 
>>>> 
>>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>>>>> 
>>>>> Hi all,
>>>>> 
>>>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>>> interactive SQL data warehousing).
>>>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>>> 
>>>>> We propose to contribute Palo as an Apache Incubator project, and
>>>>> we are still looking for possible Champion if anyone would like to
>>>> volunteer. Thanks a lot.
>>>>> 
>>>>> Best Regards,
>>>>> Reed
>>>>> 
>>>>> ===================
>>>>> The draft of the proposal as below:
>>>>> 
>>>>> #Apache Palo
>>>>> 
>>>>> ##Abstract
>>>>> 
>>>>> Palo is a MPP-based interactive SQL data warehousing for reporting
>>>>>and
>>>> analysis.
>>>>> 
>>>>> ##Proposal
>>>>> 
>>>>> We propose to contribute the Palo codebase and associated artifacts
>>>> (e.g. documentation, web-site content etc.) to the Apache Software
>>>> Foundation with the intent of forming a productive, meritocratic and
>>>>open
>>>> community around Palo’s continued development, according to the
>>>>‘Apache
>>>> Way’.
>>>>> 
>>>>> Baidu owns several trademarks regarding Palo, and proposes to
>>>>>transfer
>>>> ownership of those trademarks in full to the ASF.
>>>>> 
>>>>> ###Overview of Palo
>>>>> 
>>>>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>>>>Backend
>>>> (BE).
>>>>> 
>>>>> **Frontend daemon** consists of query coordinator and catalog
>>>>>manager.
>>>> Query coordinator is responsible for receiving users’ sql queries,
>>>> compiling queries and managing queries execution. Catalog manager is
>>>> responsible for managing metadata such as databases, tables,
>>>>partitions,
>>>> replicas and etc. Several frontend daemons could be deployed to
>>>>guarantee
>>>> fault-tolerance, and load balancing.
>>>>> 
>>>>> **Backend daemon** stores the data and executes the query fragments.
>>>> Many backend daemons could also be deployed to provide scalability and
>>>> fault-tolerance.
>>>>> 
>>>>> A typical Palo cluster generally composes of several frontend daemons
>>>> and dozens to hundreds of backend daemons.
>>>>> 
>>>>> Users can use MySQL client tools to connect any frontend daemon to
>>>> submit SQL query. Frontend receives the query and compiles it into
>>>>query
>>>> plans executable by the Backend. Then Frontend sends the query plan
>>>> fragments to Backend. Backend will build a query execution DAG. Data
>>>>is
>>>> fetched and pipelined into the DAG. The final result response is sent
>>>>to
>>>> client via Frontend. The distribution of query fragment execution
>>>>takes
>>>> minimizing data movement and maximizing scan locality as the main
>>>>goal.
>>>>> 
>>>>> ##Background
>>>>> 
>>>>> At Baidu, Prior to Palo, different tools were deployed to solve
>>>>>diverse
>>>> requirements in many ways. And when a use case requires the
>>>>simultaneous
>>>> availability of capabilities that cannot all be provided by a single
>>>>tool,
>>>> users were forced to build hybrid architectures that stitch multiple
>>>>tools
>>>> together, but we believe that they shouldn’t need to accept such
>>>>inherent
>>>> complexity. A storage system built to provide great performance
>>>>across a
>>>> broad range of workloads provides a more elegant solution to the
>>>>problems
>>>> that hybrid architectures aim to solve. Palo is the solution.
>>>>> 
>>>>> Palo is designed to be a simple and single tightly coupled system,
>>>>>not
>>>> depending on other systems. Palo provides high concurrent low latency
>>>>point
>>>> query performance, but also provides high throughput queries of ad-hoc
>>>> analysis. Palo provides bulk-batch data loading, but also provides
>>>>near
>>>> real-time mini-batch data loading. Palo also provides high
>>>>availability,
>>>> reliability, fault tolerance, and scalability.
>>>>> 
>>>>> ##Rationale
>>>>> 
>>>>> Palo mainly integrates the technology of Google Mesa and Apache
>>>>>Impala.
>>>>> 
>>>>> Mesa is a highly scalable analytic data storage system that stores
>>>> critical measurement data related to Google's Internet advertising
>>>> business. Mesa is designed to satisfy complex and challenging set of
>>>>users’
>>>> and systems’ requirements, including near real-time data ingestion and
>>>> query ability, as well as high availability, reliability, fault
>>>>tolerance,
>>>> and scalability for large data and query volumes.
>>>>> 
>>>>> Impala is a modern, open-source MPP SQL engine architected from the
>>>> ground up for the Hadoop data processing environment. At present, by
>>>>virtue
>>>> of its superior performance and rich functionality， Impala has been
>>>> comparable to many commercial MPP database query engine. Mesa can
>>>>satisfy
>>>> the needs of many of our storage requirements, however Mesa itself
>>>>does not
>>>> provide a SQL query engine; Impala is a very good MPP SQL query
>>>>engine, but
>>>> the lack of a perfect distributed storage engine. So in the end we
>>>>chose
>>>> the combination of these two technologies.
>>>>> 
>>>>> Learning from Mesa’s data model, we developed a distributed storage
>>>> engine. Unlike Mesa, this storage engine does not rely on any
>>>>distributed
>>>> file system. Then we deeply integrate this storage engine with Impala
>>>>query
>>>> engine. Query compiling, query execution coordination and catalog
>>>> management of storage engine are integrated to be frontend daemon;
>>>>query
>>>> execution and data storage are integrated to be backend daemon. With
>>>>this
>>>> integration, we implemented a single, full-featured, high performance
>>>>state
>>>> the art of MPP database, as well as maintaining the simplicity.
>>>>> 
>>>>> ##Current Status
>>>>> 
>>>>> Palo has been an open source project on GitHub (
>>>> https://github.com/baidu/palo).
>>>>> 
>>>>> ###Meritocracy
>>>>> 
>>>>> Palo has been deployed in production at Baidu and is applying more
>>>>>than
>>>> 200 lines of business. It has demonstrated great performance benefits
>>>>and
>>>> has proved to be a better way for reporting and analysis based big
>>>>data.
>>>> Still We look forward to growing a rich user and developer community.
>>>>> 
>>>>> ###Community
>>>>> 
>>>>> Palo seeks to develop developer and user communities during
>>>>>incubation.
>>>>> 
>>>>> ###Core Developers
>>>>> 
>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>maruyue@baidu.com<mailto:maruy
>>>> ue@baidu.com>)
>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>>> aa.zhaoc@gmail.com>)
>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>>>> iltolide@sina.com%EF%BC%89>
>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>> <ma...@baidu.com>)
>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>lichaoyong@baidu.com<mailto:
>>>> lichaoyong@baidu.com>)
>>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>>> gbinlb@gmail.com>)
>>>>> 
>>>>> ###Alignment
>>>>> 
>>>>> Palo is related to several other Apache projects:
>>>>> 
>>>>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>>> the HDFS filesystem.
>>>>> * Palo is closely integrated with Impala, which is also being
>>>>>proposed
>>>> to the Incubator.
>>>> 
>>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>>> 
>>>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>>> choice.
>>>>> 
>>>>> ##Known Risks
>>>>> 
>>>>> ###Orphaned Products
>>>>> 
>>>>> The core developers of Palo team plan to work full time on this
>>>>>project.
>>>> There is very little risk of Palo getting orphaned since at least one
>>>>large
>>>> company (Baidu) is extensively using it in their production. For
>>>>example,
>>>> currently there are more than 200 use cases using Palo in production.
>>>> Furthermore, since Palo was open sourced at the beginning of October
>>>>2017,
>>>> it has received more than 660 stars and been forked nearly 170 times.
>>>>We
>>>> plan to extend and diversify this community further through Apache.
>>>>> 
>>>>> ###Inexperience with Open Source
>>>>> 
>>>>> The core developers are all active users and followers of open
>>>>>source.
>>>> They are already committers and contributors to the Palo Github
>>>>project.
>>>> All have been involved with the source code that has been released
>>>>under an
>>>> open source license, and several of them also have experience
>>>>developing
>>>> code in an open source environment. Though the core set of Developers
>>>>do
>>>> not have Apache Open Source experience, there are plans to onboard
>>>> individuals with Apache open source experience on to the project.
>>>>> 
>>>>> ###Homogenous Developers
>>>>> 
>>>>> The most of core developers are from Baidu, but after Palo was open
>>>> sourced, Palo received a lot of bug fixes and enhancements from other
>>>> developers not working at Baidu.
>>>>> 
>>>>> ###Reliance on Salaried Developers
>>>>> 
>>>>> Baidu invested in Palo as the OLAP solution and some of its key
>>>> engineers are working full time on the project. In addition, since
>>>>there is
>>>> a growing Big Data need for scalable OLAP solutions, we look forward
>>>>to
>>>> other Apache developers and researchers to contribute to the project.
>>>>Also
>>>> key to addressing the risk associated with relying on Salaried
>>>>developers
>>>> from a single entity is to increase the diversity of the contributors
>>>>and
>>>> actively lobby for Domain experts in the BI space to contribute.
>>>>Apache
>>>> Palo intends to do this.
>>>>> 
>>>>> ###An Excessive Fascination with the Apache Brand
>>>>> 
>>>>> Palo is proposing to enter incubation at Apache in order to help
>>>>>efforts
>>>> to diversify the committer-base, not so much to capitalize on the
>>>>Apache
>>>> brand. The Palo project is in production use already inside Baidu,
>>>>but is
>>>> not expected to be an Baidu product for external customers. As such,
>>>>the
>>>> Palo project is not seeking to use the Apache brand as a marketing
>>>>tool.
>>>>> 
>>>>> ##Documentation
>>>>> 
>>>>> Information about Palo can be found at https://github.com/baidu/palo.
>>>> The following links provide more information about Palo in open
>>>>source:
>>>>> 
>>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>>>> * Codebase at Github: https://github.com/baidu/palo
>>>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>>>> 
>>>>> ##Initial Source
>>>>> 
>>>>> Palo has been under development since 2017 by a team of engineers at
>>>> Baidu Inc. It is currently hosted on Github.com under an Apache
>>>>license at
>>>> https://github.com/baidu/palo.
>>>>> 
>>>>> ##External Dependencies
>>>>> 
>>>>> Palo has the following external dependencies.
>>>>> 
>>>>> * Google gflags (BSD)
>>>>> * Google glog (BSD)
>>>>> * Apache Thrift (Apache Software License v2.0)
>>>>> * Apache Commons (Apache Software License v2.0)
>>>>> * Boost (Boost Software License)
>>>>> * OpenLdap (OpenLDAP Software License)
>>>>> * rapidjson (Tencent)
>>>>> * Google RE2 (BSD-style)
>>>>> * lz4 (BSD)
>>>>> * snappy (BSD)
>>>>> * cyrus-sasl (CMU License)
>>>>> * Twitter Bootstrap (Apache Software License v2.0)
>>>>> * d3 (BSD)
>>>>> * LLVM (BSD-like)
>>>>> 
>>>>> Build and test dependencies:
>>>>> 
>>>>> * ant (Apache Software License v2.0)
>>>>> * Apache Maven (Apache Software License v2.0)
>>>>> * cmake (BSD)
>>>>> * clang (BSD)
>>>>> * Google gtest (Apache Software License v2.0)
>>>>> 
>>>>> ##Required Resources
>>>>> 
>>>>> ###Mailing List
>>>>> 
>>>>> There are currently no mailing lists. The usual mailing lists are
>>>> expected to be set up when entering incubation:
>>>>> 
>>>>> private@palo.incubator.apache.org<mailto:private@palo.
>>>> incubator.apache.org>
>>>>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>>>> incubator.apache.org>
>>>>> 
>>>>> ###Subversion Directory
>>>>> 
>>>>> Upon entering incubation: https://github.com/baidu/palo.
>>>>> After incubation, we want to move the existing repo from
>>>> https://github.com/baidu/palo to Apache infrastructure.
>>>>> 
>>>>> ###Issue Tracking
>>>>> 
>>>>> Palo currently uses GitHub to track issues. Would like to continue
>>>>>to do
>>>> so while we discuss migration possibilities with the ASF Infra
>>>>committee.
>>>>> 
>>>>> ###Other Resources
>>>>> 
>>>>> The existing code already has unit tests so we will make use of
>>>>>existing
>>>> Apache continuous testing infrastructure. The resulting load should
>>>>not be
>>>> very large.
>>>>> 
>>>>> ##Initial Committers
>>>>> 
>>>>> * Ruyue Ma (https://github.com/maruyue,
>>>>>maruyue@baidu.com<mailto:maruy
>>>> ue@baidu.com>)
>>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>>> aa.zhaoc@gmail.com>)
>>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>>>> iltolide@sina.com%EF%BC%89>
>>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>>> <ma...@baidu.com>)
>>>>> * Chaoyong Li (https://github.com/cyongli,
>>>>>lichaoyong@baidu.com<mailto:
>>>> lichaoyong@baidu.com>)
>>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>>> gbinlb@gmail.com>)
>>>>> 
>>>>> ##Affiliations
>>>>> 
>>>>> The initial committers are employees of Baidu Inc.. The nominated
>>>> mentors are employees of TODO.
>>>>> 
>>>>> ##Sponsors
>>>>> 
>>>>> ###Champion
>>>>> 
>>>>> TODO
>>>>> 
>>>>> ###Nominated Mentors
>>>>> 
>>>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>>>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>>>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>>>> 
>>>> Mentors must be members of the IPMC and almost always Members of the
>>>>ASF.
>>>> 
>>>> At this moment only Luke Han is qualified.
>>>> 
>>>> Regards,
>>>> Dave
>>>> 
>>>>> 
>>>>> ###Sponsoring Entity
>>>>> 
>>>>> We are requesting the Incubator to sponsor this project.
>>>> 
>>>> 
>> ?B婯
>>KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[溳
>>X溫軞X橩??K[XZ[??賉橽榌 ][溳X溫軞X橮?[樰X榏?軏榎?X?K涇櫭B憶軋?Y??]?[蹣[??
圹[X[???K[XZ[??賉橽榌 
>>Z?[???[樰X榏?軏榎?X?K涇櫭B
>

Re: Looking for Champion

Posted by Julian Hyde <jh...@apache.org>.

Note that there is an existing database product called Palo - an open source OLAP engine by German company Jedox[1]. There there is a high likelihood that Palo would have to change its name during incubation, if accepted.

Julian

[1] https://en.wikipedia.org/wiki/Palo_(OLAP_database) <https://en.wikipedia.org/wiki/Palo_(OLAP_database)>



> On Jun 10, 2018, at 3:49 AM, Han Luke <lu...@gmail.com> wrote:
> 
> Cool Dave, it’s great to have you to be the campaign.
> 
> 
> ________________________________
> From: Tan,Zhongyi <tanzhongyi@baidu.com <ma...@baidu.com>>
> Sent: Saturday, June 9, 2018 8:16:28 AM
> To: general@incubator.apache.org <ma...@incubator.apache.org>
> Subject: Re: Looking for Champion
> 
> thanks，willem
> 
> we are very appreciate.
> 
>> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
>> 
>> Hi,
>> 
>> I'm willing to be the Mentor.
>> Please count me in.
>> 
>> 
>> 
>> Willem Jiang
>> 
>> Twitter: willemjiang
>> Weibo: 姜宁willem
>> 
>>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net> wrote:
>>> 
>>> Hi -
>>> 
>>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>>> I’ll look at dependency licenses later today. It’s early for me.
>>> 
>>> 
>>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>>>> 
>>>> Hi all,
>>>> 
>>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>> interactive SQL data warehousing).
>>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>> 
>>>> We propose to contribute Palo as an Apache Incubator project, and
>>>> we are still looking for possible Champion if anyone would like to
>>> volunteer. Thanks a lot.
>>>> 
>>>> Best Regards,
>>>> Reed
>>>> 
>>>> ===================
>>>> The draft of the proposal as below:
>>>> 
>>>> #Apache Palo
>>>> 
>>>> ##Abstract
>>>> 
>>>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>> analysis.
>>>> 
>>>> ##Proposal
>>>> 
>>>> We propose to contribute the Palo codebase and associated artifacts
>>> (e.g. documentation, web-site content etc.) to the Apache Software
>>> Foundation with the intent of forming a productive, meritocratic and open
>>> community around Palo’s continued development, according to the ‘Apache
>>> Way’.
>>>> 
>>>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>> ownership of those trademarks in full to the ASF.
>>>> 
>>>> ###Overview of Palo
>>>> 
>>>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend
>>> (BE).
>>>> 
>>>> **Frontend daemon** consists of query coordinator and catalog manager.
>>> Query coordinator is responsible for receiving users’ sql queries,
>>> compiling queries and managing queries execution. Catalog manager is
>>> responsible for managing metadata such as databases, tables, partitions,
>>> replicas and etc. Several frontend daemons could be deployed to guarantee
>>> fault-tolerance, and load balancing.
>>>> 
>>>> **Backend daemon** stores the data and executes the query fragments.
>>> Many backend daemons could also be deployed to provide scalability and
>>> fault-tolerance.
>>>> 
>>>> A typical Palo cluster generally composes of several frontend daemons
>>> and dozens to hundreds of backend daemons.
>>>> 
>>>> Users can use MySQL client tools to connect any frontend daemon to
>>> submit SQL query. Frontend receives the query and compiles it into query
>>> plans executable by the Backend. Then Frontend sends the query plan
>>> fragments to Backend. Backend will build a query execution DAG. Data is
>>> fetched and pipelined into the DAG. The final result response is sent to
>>> client via Frontend. The distribution of query fragment execution takes
>>> minimizing data movement and maximizing scan locality as the main goal.
>>>> 
>>>> ##Background
>>>> 
>>>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>> requirements in many ways. And when a use case requires the simultaneous
>>> availability of capabilities that cannot all be provided by a single tool,
>>> users were forced to build hybrid architectures that stitch multiple tools
>>> together, but we believe that they shouldn’t need to accept such inherent
>>> complexity. A storage system built to provide great performance across a
>>> broad range of workloads provides a more elegant solution to the problems
>>> that hybrid architectures aim to solve. Palo is the solution.
>>>> 
>>>> Palo is designed to be a simple and single tightly coupled system, not
>>> depending on other systems. Palo provides high concurrent low latency point
>>> query performance, but also provides high throughput queries of ad-hoc
>>> analysis. Palo provides bulk-batch data loading, but also provides near
>>> real-time mini-batch data loading. Palo also provides high availability,
>>> reliability, fault tolerance, and scalability.
>>>> 
>>>> ##Rationale
>>>> 
>>>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>>>> 
>>>> Mesa is a highly scalable analytic data storage system that stores
>>> critical measurement data related to Google's Internet advertising
>>> business. Mesa is designed to satisfy complex and challenging set of users’
>>> and systems’ requirements, including near real-time data ingestion and
>>> query ability, as well as high availability, reliability, fault tolerance,
>>> and scalability for large data and query volumes.
>>>> 
>>>> Impala is a modern, open-source MPP SQL engine architected from the
>>> ground up for the Hadoop data processing environment. At present, by virtue
>>> of its superior performance and rich functionality， Impala has been
>>> comparable to many commercial MPP database query engine. Mesa can satisfy
>>> the needs of many of our storage requirements, however Mesa itself does not
>>> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
>>> the lack of a perfect distributed storage engine. So in the end we chose
>>> the combination of these two technologies.
>>>> 
>>>> Learning from Mesa’s data model, we developed a distributed storage
>>> engine. Unlike Mesa, this storage engine does not rely on any distributed
>>> file system. Then we deeply integrate this storage engine with Impala query
>>> engine. Query compiling, query execution coordination and catalog
>>> management of storage engine are integrated to be frontend daemon; query
>>> execution and data storage are integrated to be backend daemon. With this
>>> integration, we implemented a single, full-featured, high performance state
>>> the art of MPP database, as well as maintaining the simplicity.
>>>> 
>>>> ##Current Status
>>>> 
>>>> Palo has been an open source project on GitHub (
>>> https://github.com/baidu/palo).
>>>> 
>>>> ###Meritocracy
>>>> 
>>>> Palo has been deployed in production at Baidu and is applying more than
>>> 200 lines of business. It has demonstrated great performance benefits and
>>> has proved to be a better way for reporting and analysis based big data.
>>> Still We look forward to growing a rich user and developer community.
>>>> 
>>>> ###Community
>>>> 
>>>> Palo seeks to develop developer and user communities during incubation.
>>>> 
>>>> ###Core Developers
>>>> 
>>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>>> ue@baidu.com>)
>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>> aa.zhaoc@gmail.com>)
>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>>> iltolide@sina.com%EF%BC%89>
>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>> <ma...@baidu.com>)
>>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>>> lichaoyong@baidu.com>)
>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>> gbinlb@gmail.com>)
>>>> 
>>>> ###Alignment
>>>> 
>>>> Palo is related to several other Apache projects:
>>>> 
>>>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>> the HDFS filesystem.
>>>> * Palo is closely integrated with Impala, which is also being proposed
>>> to the Incubator.
>>> 
>>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>> 
>>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>> choice.
>>>> 
>>>> ##Known Risks
>>>> 
>>>> ###Orphaned Products
>>>> 
>>>> The core developers of Palo team plan to work full time on this project.
>>> There is very little risk of Palo getting orphaned since at least one large
>>> company (Baidu) is extensively using it in their production. For example,
>>> currently there are more than 200 use cases using Palo in production.
>>> Furthermore, since Palo was open sourced at the beginning of October 2017,
>>> it has received more than 660 stars and been forked nearly 170 times. We
>>> plan to extend and diversify this community further through Apache.
>>>> 
>>>> ###Inexperience with Open Source
>>>> 
>>>> The core developers are all active users and followers of open source.
>>> They are already committers and contributors to the Palo Github project.
>>> All have been involved with the source code that has been released under an
>>> open source license, and several of them also have experience developing
>>> code in an open source environment. Though the core set of Developers do
>>> not have Apache Open Source experience, there are plans to onboard
>>> individuals with Apache open source experience on to the project.
>>>> 
>>>> ###Homogenous Developers
>>>> 
>>>> The most of core developers are from Baidu, but after Palo was open
>>> sourced, Palo received a lot of bug fixes and enhancements from other
>>> developers not working at Baidu.
>>>> 
>>>> ###Reliance on Salaried Developers
>>>> 
>>>> Baidu invested in Palo as the OLAP solution and some of its key
>>> engineers are working full time on the project. In addition, since there is
>>> a growing Big Data need for scalable OLAP solutions, we look forward to
>>> other Apache developers and researchers to contribute to the project. Also
>>> key to addressing the risk associated with relying on Salaried developers
>>> from a single entity is to increase the diversity of the contributors and
>>> actively lobby for Domain experts in the BI space to contribute. Apache
>>> Palo intends to do this.
>>>> 
>>>> ###An Excessive Fascination with the Apache Brand
>>>> 
>>>> Palo is proposing to enter incubation at Apache in order to help efforts
>>> to diversify the committer-base, not so much to capitalize on the Apache
>>> brand. The Palo project is in production use already inside Baidu, but is
>>> not expected to be an Baidu product for external customers. As such, the
>>> Palo project is not seeking to use the Apache brand as a marketing tool.
>>>> 
>>>> ##Documentation
>>>> 
>>>> Information about Palo can be found at https://github.com/baidu/palo.
>>> The following links provide more information about Palo in open source:
>>>> 
>>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>>> * Codebase at Github: https://github.com/baidu/palo
>>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>>> 
>>>> ##Initial Source
>>>> 
>>>> Palo has been under development since 2017 by a team of engineers at
>>> Baidu Inc. It is currently hosted on Github.com under an Apache license at
>>> https://github.com/baidu/palo.
>>>> 
>>>> ##External Dependencies
>>>> 
>>>> Palo has the following external dependencies.
>>>> 
>>>> * Google gflags (BSD)
>>>> * Google glog (BSD)
>>>> * Apache Thrift (Apache Software License v2.0)
>>>> * Apache Commons (Apache Software License v2.0)
>>>> * Boost (Boost Software License)
>>>> * OpenLdap (OpenLDAP Software License)
>>>> * rapidjson (Tencent)
>>>> * Google RE2 (BSD-style)
>>>> * lz4 (BSD)
>>>> * snappy (BSD)
>>>> * cyrus-sasl (CMU License)
>>>> * Twitter Bootstrap (Apache Software License v2.0)
>>>> * d3 (BSD)
>>>> * LLVM (BSD-like)
>>>> 
>>>> Build and test dependencies:
>>>> 
>>>> * ant (Apache Software License v2.0)
>>>> * Apache Maven (Apache Software License v2.0)
>>>> * cmake (BSD)
>>>> * clang (BSD)
>>>> * Google gtest (Apache Software License v2.0)
>>>> 
>>>> ##Required Resources
>>>> 
>>>> ###Mailing List
>>>> 
>>>> There are currently no mailing lists. The usual mailing lists are
>>> expected to be set up when entering incubation:
>>>> 
>>>> private@palo.incubator.apache.org<mailto:private@palo.
>>> incubator.apache.org>
>>>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>>> incubator.apache.org>
>>>> 
>>>> ###Subversion Directory
>>>> 
>>>> Upon entering incubation: https://github.com/baidu/palo.
>>>> After incubation, we want to move the existing repo from
>>> https://github.com/baidu/palo to Apache infrastructure.
>>>> 
>>>> ###Issue Tracking
>>>> 
>>>> Palo currently uses GitHub to track issues. Would like to continue to do
>>> so while we discuss migration possibilities with the ASF Infra committee.
>>>> 
>>>> ###Other Resources
>>>> 
>>>> The existing code already has unit tests so we will make use of existing
>>> Apache continuous testing infrastructure. The resulting load should not be
>>> very large.
>>>> 
>>>> ##Initial Committers
>>>> 
>>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>>> ue@baidu.com>)
>>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>>> aa.zhaoc@gmail.com>)
>>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>>> iltolide@sina.com%EF%BC%89>
>>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>>> <ma...@baidu.com>)
>>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>>> lichaoyong@baidu.com>)
>>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>>> gbinlb@gmail.com>)
>>>> 
>>>> ##Affiliations
>>>> 
>>>> The initial committers are employees of Baidu Inc.. The nominated
>>> mentors are employees of TODO.
>>>> 
>>>> ##Sponsors
>>>> 
>>>> ###Champion
>>>> 
>>>> TODO
>>>> 
>>>> ###Nominated Mentors
>>>> 
>>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>>> 
>>> Mentors must be members of the IPMC and almost always Members of the ASF.
>>> 
>>> At this moment only Luke Han is qualified.
>>> 
>>> Regards,
>>> Dave
>>> 
>>>> 
>>>> ###Sponsoring Entity
>>>> 
>>>> We are requesting the Incubator to sponsor this project.
>>> 
>>> 
> B婯KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[溳X溫軞X橩K[XZ[?賉橽榌 ][溳X溫軞X橮[樰X榏軏榎X?K涇櫭B憶軋Y][蹣[圹[X[??K[XZ[?賉橽榌 Z[[樰X榏軏榎X?K涇櫭B

Re: Looking for Champion

Posted by Han Luke <lu...@gmail.com>.

Cool Dave, it��s great to have you to be the campaign.


________________________________
From: Tan,Zhongyi <ta...@baidu.com>
Sent: Saturday, June 9, 2018 8:16:28 AM
To: general@incubator.apache.org
Subject: Re: Looking for Champion

thanks��willem

we are very appreciate.

> �� 2018��6��8�գ�23:03��Willem Jiang <wi...@gmail.com> д����
>
> Hi,
>
> I'm willing to be the Mentor.
> Please count me in.
>
>
>
> Willem Jiang
>
> Twitter: willemjiang
> Weibo: ����willem
>
>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net> wrote:
>>
>> Hi -
>>
>> I��m willing to Champion and Mentor. I have a couple of comments inline.
>> I��ll look at dependency licenses later today. It��s early for me.
>>
>>
>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>>>
>>> Hi all,
>>>
>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>> interactive SQL data warehousing).
>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>>
>>> We propose to contribute Palo as an Apache Incubator project, and
>>> we are still looking for possible Champion if anyone would like to
>> volunteer. Thanks a lot.
>>>
>>> Best Regards,
>>> Reed
>>>
>>> ===================
>>> The draft of the proposal as below:
>>>
>>> #Apache Palo
>>>
>>> ##Abstract
>>>
>>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>> analysis.
>>>
>>> ##Proposal
>>>
>>> We propose to contribute the Palo codebase and associated artifacts
>> (e.g. documentation, web-site content etc.) to the Apache Software
>> Foundation with the intent of forming a productive, meritocratic and open
>> community around Palo��s continued development, according to the ��Apache
>> Way��.
>>>
>>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>> ownership of those trademarks in full to the ASF.
>>>
>>> ###Overview of Palo
>>>
>>> Palo��s implementation consists of two daemons: Frontend (FE) and Backend
>> (BE).
>>>
>>> **Frontend daemon** consists of query coordinator and catalog manager.
>> Query coordinator is responsible for receiving users�� sql queries,
>> compiling queries and managing queries execution. Catalog manager is
>> responsible for managing metadata such as databases, tables, partitions,
>> replicas and etc. Several frontend daemons could be deployed to guarantee
>> fault-tolerance, and load balancing.
>>>
>>> **Backend daemon** stores the data and executes the query fragments.
>> Many backend daemons could also be deployed to provide scalability and
>> fault-tolerance.
>>>
>>> A typical Palo cluster generally composes of several frontend daemons
>> and dozens to hundreds of backend daemons.
>>>
>>> Users can use MySQL client tools to connect any frontend daemon to
>> submit SQL query. Frontend receives the query and compiles it into query
>> plans executable by the Backend. Then Frontend sends the query plan
>> fragments to Backend. Backend will build a query execution DAG. Data is
>> fetched and pipelined into the DAG. The final result response is sent to
>> client via Frontend. The distribution of query fragment execution takes
>> minimizing data movement and maximizing scan locality as the main goal.
>>>
>>> ##Background
>>>
>>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>> requirements in many ways. And when a use case requires the simultaneous
>> availability of capabilities that cannot all be provided by a single tool,
>> users were forced to build hybrid architectures that stitch multiple tools
>> together, but we believe that they shouldn��t need to accept such inherent
>> complexity. A storage system built to provide great performance across a
>> broad range of workloads provides a more elegant solution to the problems
>> that hybrid architectures aim to solve. Palo is the solution.
>>>
>>> Palo is designed to be a simple and single tightly coupled system, not
>> depending on other systems. Palo provides high concurrent low latency point
>> query performance, but also provides high throughput queries of ad-hoc
>> analysis. Palo provides bulk-batch data loading, but also provides near
>> real-time mini-batch data loading. Palo also provides high availability,
>> reliability, fault tolerance, and scalability.
>>>
>>> ##Rationale
>>>
>>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>>>
>>> Mesa is a highly scalable analytic data storage system that stores
>> critical measurement data related to Google's Internet advertising
>> business. Mesa is designed to satisfy complex and challenging set of users��
>> and systems�� requirements, including near real-time data ingestion and
>> query ability, as well as high availability, reliability, fault tolerance,
>> and scalability for large data and query volumes.
>>>
>>> Impala is a modern, open-source MPP SQL engine architected from the
>> ground up for the Hadoop data processing environment. At present, by virtue
>> of its superior performance and rich functionality�� Impala has been
>> comparable to many commercial MPP database query engine. Mesa can satisfy
>> the needs of many of our storage requirements, however Mesa itself does not
>> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
>> the lack of a perfect distributed storage engine. So in the end we chose
>> the combination of these two technologies.
>>>
>>> Learning from Mesa��s data model, we developed a distributed storage
>> engine. Unlike Mesa, this storage engine does not rely on any distributed
>> file system. Then we deeply integrate this storage engine with Impala query
>> engine. Query compiling, query execution coordination and catalog
>> management of storage engine are integrated to be frontend daemon; query
>> execution and data storage are integrated to be backend daemon. With this
>> integration, we implemented a single, full-featured, high performance state
>> the art of MPP database, as well as maintaining the simplicity.
>>>
>>> ##Current Status
>>>
>>> Palo has been an open source project on GitHub (
>> https://github.com/baidu/palo).
>>>
>>> ###Meritocracy
>>>
>>> Palo has been deployed in production at Baidu and is applying more than
>> 200 lines of business. It has demonstrated great performance benefits and
>> has proved to be a better way for reporting and analysis based big data.
>> Still We look forward to growing a rich user and developer community.
>>>
>>> ###Community
>>>
>>> Palo seeks to develop developer and user communities during incubation.
>>>
>>> ###Core Developers
>>>
>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>> * De Li��https://github.com/lide-reed, mailtolide@sina.com��<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <ma...@baidu.com>)
>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>>>
>>> ###Alignment
>>>
>>> Palo is related to several other Apache projects:
>>>
>>> * Palo can also read data stored in Apache Hadoop clusters powered by
>> the HDFS filesystem.
>>> * Palo is closely integrated with Impala, which is also being proposed
>> to the Incubator.
>>
>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>
>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>> choice.
>>>
>>> ##Known Risks
>>>
>>> ###Orphaned Products
>>>
>>> The core developers of Palo team plan to work full time on this project.
>> There is very little risk of Palo getting orphaned since at least one large
>> company (Baidu) is extensively using it in their production. For example,
>> currently there are more than 200 use cases using Palo in production.
>> Furthermore, since Palo was open sourced at the beginning of October 2017,
>> it has received more than 660 stars and been forked nearly 170 times. We
>> plan to extend and diversify this community further through Apache.
>>>
>>> ###Inexperience with Open Source
>>>
>>> The core developers are all active users and followers of open source.
>> They are already committers and contributors to the Palo Github project.
>> All have been involved with the source code that has been released under an
>> open source license, and several of them also have experience developing
>> code in an open source environment. Though the core set of Developers do
>> not have Apache Open Source experience, there are plans to onboard
>> individuals with Apache open source experience on to the project.
>>>
>>> ###Homogenous Developers
>>>
>>> The most of core developers are from Baidu, but after Palo was open
>> sourced, Palo received a lot of bug fixes and enhancements from other
>> developers not working at Baidu.
>>>
>>> ###Reliance on Salaried Developers
>>>
>>> Baidu invested in Palo as the OLAP solution and some of its key
>> engineers are working full time on the project. In addition, since there is
>> a growing Big Data need for scalable OLAP solutions, we look forward to
>> other Apache developers and researchers to contribute to the project. Also
>> key to addressing the risk associated with relying on Salaried developers
>> from a single entity is to increase the diversity of the contributors and
>> actively lobby for Domain experts in the BI space to contribute. Apache
>> Palo intends to do this.
>>>
>>> ###An Excessive Fascination with the Apache Brand
>>>
>>> Palo is proposing to enter incubation at Apache in order to help efforts
>> to diversify the committer-base, not so much to capitalize on the Apache
>> brand. The Palo project is in production use already inside Baidu, but is
>> not expected to be an Baidu product for external customers. As such, the
>> Palo project is not seeking to use the Apache brand as a marketing tool.
>>>
>>> ##Documentation
>>>
>>> Information about Palo can be found at https://github.com/baidu/palo.
>> The following links provide more information about Palo in open source:
>>>
>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>> * Codebase at Github: https://github.com/baidu/palo
>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>>
>>> ##Initial Source
>>>
>>> Palo has been under development since 2017 by a team of engineers at
>> Baidu Inc. It is currently hosted on Github.com under an Apache license at
>> https://github.com/baidu/palo.
>>>
>>> ##External Dependencies
>>>
>>> Palo has the following external dependencies.
>>>
>>> * Google gflags (BSD)
>>> * Google glog (BSD)
>>> * Apache Thrift (Apache Software License v2.0)
>>> * Apache Commons (Apache Software License v2.0)
>>> * Boost (Boost Software License)
>>> * OpenLdap (OpenLDAP Software License)
>>> * rapidjson (Tencent)
>>> * Google RE2 (BSD-style)
>>> * lz4 (BSD)
>>> * snappy (BSD)
>>> * cyrus-sasl (CMU License)
>>> * Twitter Bootstrap (Apache Software License v2.0)
>>> * d3 (BSD)
>>> * LLVM (BSD-like)
>>>
>>> Build and test dependencies:
>>>
>>> * ant (Apache Software License v2.0)
>>> * Apache Maven (Apache Software License v2.0)
>>> * cmake (BSD)
>>> * clang (BSD)
>>> * Google gtest (Apache Software License v2.0)
>>>
>>> ##Required Resources
>>>
>>> ###Mailing List
>>>
>>> There are currently no mailing lists. The usual mailing lists are
>> expected to be set up when entering incubation:
>>>
>>> private@palo.incubator.apache.org<mailto:private@palo.
>> incubator.apache.org>
>>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>> incubator.apache.org>
>>>
>>> ###Subversion Directory
>>>
>>> Upon entering incubation: https://github.com/baidu/palo.
>>> After incubation, we want to move the existing repo from
>> https://github.com/baidu/palo to Apache infrastructure.
>>>
>>> ###Issue Tracking
>>>
>>> Palo currently uses GitHub to track issues. Would like to continue to do
>> so while we discuss migration possibilities with the ASF Infra committee.
>>>
>>> ###Other Resources
>>>
>>> The existing code already has unit tests so we will make use of existing
>> Apache continuous testing infrastructure. The resulting load should not be
>> very large.
>>>
>>> ##Initial Committers
>>>
>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>> * De Li��https://github.com/lide-reed, mailtolide@sina.com��<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <ma...@baidu.com>)
>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>>>
>>> ##Affiliations
>>>
>>> The initial committers are employees of Baidu Inc.. The nominated
>> mentors are employees of TODO.
>>>
>>> ##Sponsors
>>>
>>> ###Champion
>>>
>>> TODO
>>>
>>> ###Nominated Mentors
>>>
>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>>
>> Mentors must be members of the IPMC and almost always Members of the ASF.
>>
>> At this moment only Luke Han is qualified.
>>
>> Regards,
>> Dave
>>
>>>
>>> ###Sponsoring Entity
>>>
>>> We are requesting the Incubator to sponsor this project.
>>
>>
B�KKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKKCB??[��X��ܚX�KK[XZ[?�[�\�[ ][��X��ܚX�P[��X�]܋�\X?K�ܙ�B��܈Y][ۘ[��[X[??K[XZ[?�[�\�[ Z[[��X�]܋�\X?K�ܙ�B

Re: Looking for Champion

Posted by "Tan,Zhongyi" <ta...@baidu.com>.

thanks，willem

we are very appreciate.

> 在 2018年6月8日，23:03，Willem Jiang <wi...@gmail.com> 写道：
> 
> Hi,
> 
> I'm willing to be the Mentor.
> Please count me in.
> 
> 
> 
> Willem Jiang
> 
> Twitter: willemjiang
> Weibo: 姜宁willem
> 
>> On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net> wrote:
>> 
>> Hi -
>> 
>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>> I’ll look at dependency licenses later today. It’s early for me.
>> 
>> 
>>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>>> 
>>> Hi all,
>>> 
>>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>> interactive SQL data warehousing).
>>> https://github.com/baidu/palo/wiki/Palo-Overview
>>> 
>>> We propose to contribute Palo as an Apache Incubator project, and
>>> we are still looking for possible Champion if anyone would like to
>> volunteer. Thanks a lot.
>>> 
>>> Best Regards,
>>> Reed
>>> 
>>> ===================
>>> The draft of the proposal as below:
>>> 
>>> #Apache Palo
>>> 
>>> ##Abstract
>>> 
>>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>> analysis.
>>> 
>>> ##Proposal
>>> 
>>> We propose to contribute the Palo codebase and associated artifacts
>> (e.g. documentation, web-site content etc.) to the Apache Software
>> Foundation with the intent of forming a productive, meritocratic and open
>> community around Palo’s continued development, according to the ‘Apache
>> Way’.
>>> 
>>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>> ownership of those trademarks in full to the ASF.
>>> 
>>> ###Overview of Palo
>>> 
>>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend
>> (BE).
>>> 
>>> **Frontend daemon** consists of query coordinator and catalog manager.
>> Query coordinator is responsible for receiving users’ sql queries,
>> compiling queries and managing queries execution. Catalog manager is
>> responsible for managing metadata such as databases, tables, partitions,
>> replicas and etc. Several frontend daemons could be deployed to guarantee
>> fault-tolerance, and load balancing.
>>> 
>>> **Backend daemon** stores the data and executes the query fragments.
>> Many backend daemons could also be deployed to provide scalability and
>> fault-tolerance.
>>> 
>>> A typical Palo cluster generally composes of several frontend daemons
>> and dozens to hundreds of backend daemons.
>>> 
>>> Users can use MySQL client tools to connect any frontend daemon to
>> submit SQL query. Frontend receives the query and compiles it into query
>> plans executable by the Backend. Then Frontend sends the query plan
>> fragments to Backend. Backend will build a query execution DAG. Data is
>> fetched and pipelined into the DAG. The final result response is sent to
>> client via Frontend. The distribution of query fragment execution takes
>> minimizing data movement and maximizing scan locality as the main goal.
>>> 
>>> ##Background
>>> 
>>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>> requirements in many ways. And when a use case requires the simultaneous
>> availability of capabilities that cannot all be provided by a single tool,
>> users were forced to build hybrid architectures that stitch multiple tools
>> together, but we believe that they shouldn’t need to accept such inherent
>> complexity. A storage system built to provide great performance across a
>> broad range of workloads provides a more elegant solution to the problems
>> that hybrid architectures aim to solve. Palo is the solution.
>>> 
>>> Palo is designed to be a simple and single tightly coupled system, not
>> depending on other systems. Palo provides high concurrent low latency point
>> query performance, but also provides high throughput queries of ad-hoc
>> analysis. Palo provides bulk-batch data loading, but also provides near
>> real-time mini-batch data loading. Palo also provides high availability,
>> reliability, fault tolerance, and scalability.
>>> 
>>> ##Rationale
>>> 
>>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>>> 
>>> Mesa is a highly scalable analytic data storage system that stores
>> critical measurement data related to Google's Internet advertising
>> business. Mesa is designed to satisfy complex and challenging set of users’
>> and systems’ requirements, including near real-time data ingestion and
>> query ability, as well as high availability, reliability, fault tolerance,
>> and scalability for large data and query volumes.
>>> 
>>> Impala is a modern, open-source MPP SQL engine architected from the
>> ground up for the Hadoop data processing environment. At present, by virtue
>> of its superior performance and rich functionality， Impala has been
>> comparable to many commercial MPP database query engine. Mesa can satisfy
>> the needs of many of our storage requirements, however Mesa itself does not
>> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
>> the lack of a perfect distributed storage engine. So in the end we chose
>> the combination of these two technologies.
>>> 
>>> Learning from Mesa’s data model, we developed a distributed storage
>> engine. Unlike Mesa, this storage engine does not rely on any distributed
>> file system. Then we deeply integrate this storage engine with Impala query
>> engine. Query compiling, query execution coordination and catalog
>> management of storage engine are integrated to be frontend daemon; query
>> execution and data storage are integrated to be backend daemon. With this
>> integration, we implemented a single, full-featured, high performance state
>> the art of MPP database, as well as maintaining the simplicity.
>>> 
>>> ##Current Status
>>> 
>>> Palo has been an open source project on GitHub (
>> https://github.com/baidu/palo).
>>> 
>>> ###Meritocracy
>>> 
>>> Palo has been deployed in production at Baidu and is applying more than
>> 200 lines of business. It has demonstrated great performance benefits and
>> has proved to be a better way for reporting and analysis based big data.
>> Still We look forward to growing a rich user and developer community.
>>> 
>>> ###Community
>>> 
>>> Palo seeks to develop developer and user communities during incubation.
>>> 
>>> ###Core Developers
>>> 
>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <ma...@baidu.com>)
>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>>> 
>>> ###Alignment
>>> 
>>> Palo is related to several other Apache projects:
>>> 
>>> * Palo can also read data stored in Apache Hadoop clusters powered by
>> the HDFS filesystem.
>>> * Palo is closely integrated with Impala, which is also being proposed
>> to the Incubator.
>> 
>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>> 
>>> * Palo uses Apache Thrift as its RPC and serialization framework of
>> choice.
>>> 
>>> ##Known Risks
>>> 
>>> ###Orphaned Products
>>> 
>>> The core developers of Palo team plan to work full time on this project.
>> There is very little risk of Palo getting orphaned since at least one large
>> company (Baidu) is extensively using it in their production. For example,
>> currently there are more than 200 use cases using Palo in production.
>> Furthermore, since Palo was open sourced at the beginning of October 2017,
>> it has received more than 660 stars and been forked nearly 170 times. We
>> plan to extend and diversify this community further through Apache.
>>> 
>>> ###Inexperience with Open Source
>>> 
>>> The core developers are all active users and followers of open source.
>> They are already committers and contributors to the Palo Github project.
>> All have been involved with the source code that has been released under an
>> open source license, and several of them also have experience developing
>> code in an open source environment. Though the core set of Developers do
>> not have Apache Open Source experience, there are plans to onboard
>> individuals with Apache open source experience on to the project.
>>> 
>>> ###Homogenous Developers
>>> 
>>> The most of core developers are from Baidu, but after Palo was open
>> sourced, Palo received a lot of bug fixes and enhancements from other
>> developers not working at Baidu.
>>> 
>>> ###Reliance on Salaried Developers
>>> 
>>> Baidu invested in Palo as the OLAP solution and some of its key
>> engineers are working full time on the project. In addition, since there is
>> a growing Big Data need for scalable OLAP solutions, we look forward to
>> other Apache developers and researchers to contribute to the project. Also
>> key to addressing the risk associated with relying on Salaried developers
>> from a single entity is to increase the diversity of the contributors and
>> actively lobby for Domain experts in the BI space to contribute. Apache
>> Palo intends to do this.
>>> 
>>> ###An Excessive Fascination with the Apache Brand
>>> 
>>> Palo is proposing to enter incubation at Apache in order to help efforts
>> to diversify the committer-base, not so much to capitalize on the Apache
>> brand. The Palo project is in production use already inside Baidu, but is
>> not expected to be an Baidu product for external customers. As such, the
>> Palo project is not seeking to use the Apache brand as a marketing tool.
>>> 
>>> ##Documentation
>>> 
>>> Information about Palo can be found at https://github.com/baidu/palo.
>> The following links provide more information about Palo in open source:
>>> 
>>> * Palo wiki site: https://github.com/baidu/palo/wiki
>>> * Codebase at Github: https://github.com/baidu/palo
>>> * Issue Tracking: https://github.com/baidu/palo/issues
>>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>>> 
>>> ##Initial Source
>>> 
>>> Palo has been under development since 2017 by a team of engineers at
>> Baidu Inc. It is currently hosted on Github.com under an Apache license at
>> https://github.com/baidu/palo.
>>> 
>>> ##External Dependencies
>>> 
>>> Palo has the following external dependencies.
>>> 
>>> * Google gflags (BSD)
>>> * Google glog (BSD)
>>> * Apache Thrift (Apache Software License v2.0)
>>> * Apache Commons (Apache Software License v2.0)
>>> * Boost (Boost Software License)
>>> * OpenLdap (OpenLDAP Software License)
>>> * rapidjson (Tencent)
>>> * Google RE2 (BSD-style)
>>> * lz4 (BSD)
>>> * snappy (BSD)
>>> * cyrus-sasl (CMU License)
>>> * Twitter Bootstrap (Apache Software License v2.0)
>>> * d3 (BSD)
>>> * LLVM (BSD-like)
>>> 
>>> Build and test dependencies:
>>> 
>>> * ant (Apache Software License v2.0)
>>> * Apache Maven (Apache Software License v2.0)
>>> * cmake (BSD)
>>> * clang (BSD)
>>> * Google gtest (Apache Software License v2.0)
>>> 
>>> ##Required Resources
>>> 
>>> ###Mailing List
>>> 
>>> There are currently no mailing lists. The usual mailing lists are
>> expected to be set up when entering incubation:
>>> 
>>> private@palo.incubator.apache.org<mailto:private@palo.
>> incubator.apache.org>
>>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>>> commits@palo.incubator.apache.org<mailto:commits@palo.
>> incubator.apache.org>
>>> 
>>> ###Subversion Directory
>>> 
>>> Upon entering incubation: https://github.com/baidu/palo.
>>> After incubation, we want to move the existing repo from
>> https://github.com/baidu/palo to Apache infrastructure.
>>> 
>>> ###Issue Tracking
>>> 
>>> Palo currently uses GitHub to track issues. Would like to continue to do
>> so while we discuss migration possibilities with the ASF Infra committee.
>>> 
>>> ###Other Resources
>>> 
>>> The existing code already has unit tests so we will make use of existing
>> Apache continuous testing infrastructure. The resulting load should not be
>> very large.
>>> 
>>> ##Initial Committers
>>> 
>>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <ma...@baidu.com>)
>>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>>> 
>>> ##Affiliations
>>> 
>>> The initial committers are employees of Baidu Inc.. The nominated
>> mentors are employees of TODO.
>>> 
>>> ##Sponsors
>>> 
>>> ###Champion
>>> 
>>> TODO
>>> 
>>> ###Nominated Mentors
>>> 
>>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>> 
>> Mentors must be members of the IPMC and almost always Members of the ASF.
>> 
>> At this moment only Luke Han is qualified.
>> 
>> Regards,
>> Dave
>> 
>>> 
>>> ###Sponsoring Entity
>>> 
>>> We are requesting the Incubator to sponsor this project.
>> 
>>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Thank you Willem, warmly welcome.

在 2018/6/8 下午11:03， "Willem Jiang" <wi...@gmail.com> 写入:

>Hi,
>
>I'm willing to be the Mentor.
>Please count me in.
>
>
>
>Willem Jiang
>
>Twitter: willemjiang
>Weibo: 姜宁willem
>
>On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net> wrote:
>
>> Hi -
>>
>> I’m willing to Champion and Mentor. I have a couple of comments inline.
>> I’ll look at dependency licenses later today. It’s early for me.
>>
>>
>> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>> >
>> > Hi all,
>> >
>> > I am Reed, as a developer worked with the team for Palo (a MPP-based
>> interactive SQL data warehousing).
>> > https://github.com/baidu/palo/wiki/Palo-Overview
>> >
>> > We propose to contribute Palo as an Apache Incubator project, and
>> > we are still looking for possible Champion if anyone would like to
>> volunteer. Thanks a lot.
>> >
>> > Best Regards,
>> > Reed
>> >
>> > ===================
>> > The draft of the proposal as below:
>> >
>> > #Apache Palo
>> >
>> > ##Abstract
>> >
>> > Palo is a MPP-based interactive SQL data warehousing for reporting and
>> analysis.
>> >
>> > ##Proposal
>> >
>> > We propose to contribute the Palo codebase and associated artifacts
>> (e.g. documentation, web-site content etc.) to the Apache Software
>> Foundation with the intent of forming a productive, meritocratic and
>>open
>> community around Palo’s continued development, according to the ‘Apache
>> Way’.
>> >
>> > Baidu owns several trademarks regarding Palo, and proposes to transfer
>> ownership of those trademarks in full to the ASF.
>> >
>> > ###Overview of Palo
>> >
>> > Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend
>> (BE).
>> >
>> > **Frontend daemon** consists of query coordinator and catalog manager.
>> Query coordinator is responsible for receiving users’ sql queries,
>> compiling queries and managing queries execution. Catalog manager is
>> responsible for managing metadata such as databases, tables, partitions,
>> replicas and etc. Several frontend daemons could be deployed to
>>guarantee
>> fault-tolerance, and load balancing.
>> >
>> > **Backend daemon** stores the data and executes the query fragments.
>> Many backend daemons could also be deployed to provide scalability and
>> fault-tolerance.
>> >
>> > A typical Palo cluster generally composes of several frontend daemons
>> and dozens to hundreds of backend daemons.
>> >
>> > Users can use MySQL client tools to connect any frontend daemon to
>> submit SQL query. Frontend receives the query and compiles it into query
>> plans executable by the Backend. Then Frontend sends the query plan
>> fragments to Backend. Backend will build a query execution DAG. Data is
>> fetched and pipelined into the DAG. The final result response is sent to
>> client via Frontend. The distribution of query fragment execution takes
>> minimizing data movement and maximizing scan locality as the main goal.
>> >
>> > ##Background
>> >
>> > At Baidu, Prior to Palo, different tools were deployed to solve
>>diverse
>> requirements in many ways. And when a use case requires the simultaneous
>> availability of capabilities that cannot all be provided by a single
>>tool,
>> users were forced to build hybrid architectures that stitch multiple
>>tools
>> together, but we believe that they shouldn’t need to accept such
>>inherent
>> complexity. A storage system built to provide great performance across a
>> broad range of workloads provides a more elegant solution to the
>>problems
>> that hybrid architectures aim to solve. Palo is the solution.
>> >
>> > Palo is designed to be a simple and single tightly coupled system, not
>> depending on other systems. Palo provides high concurrent low latency
>>point
>> query performance, but also provides high throughput queries of ad-hoc
>> analysis. Palo provides bulk-batch data loading, but also provides near
>> real-time mini-batch data loading. Palo also provides high availability,
>> reliability, fault tolerance, and scalability.
>> >
>> > ##Rationale
>> >
>> > Palo mainly integrates the technology of Google Mesa and Apache
>>Impala.
>> >
>> > Mesa is a highly scalable analytic data storage system that stores
>> critical measurement data related to Google's Internet advertising
>> business. Mesa is designed to satisfy complex and challenging set of
>>users’
>> and systems’ requirements, including near real-time data ingestion and
>> query ability, as well as high availability, reliability, fault
>>tolerance,
>> and scalability for large data and query volumes.
>> >
>> > Impala is a modern, open-source MPP SQL engine architected from the
>> ground up for the Hadoop data processing environment. At present, by
>>virtue
>> of its superior performance and rich functionality， Impala has been
>> comparable to many commercial MPP database query engine. Mesa can
>>satisfy
>> the needs of many of our storage requirements, however Mesa itself does
>>not
>> provide a SQL query engine; Impala is a very good MPP SQL query engine,
>>but
>> the lack of a perfect distributed storage engine. So in the end we chose
>> the combination of these two technologies.
>> >
>> > Learning from Mesa’s data model, we developed a distributed storage
>> engine. Unlike Mesa, this storage engine does not rely on any
>>distributed
>> file system. Then we deeply integrate this storage engine with Impala
>>query
>> engine. Query compiling, query execution coordination and catalog
>> management of storage engine are integrated to be frontend daemon; query
>> execution and data storage are integrated to be backend daemon. With
>>this
>> integration, we implemented a single, full-featured, high performance
>>state
>> the art of MPP database, as well as maintaining the simplicity.
>> >
>> > ##Current Status
>> >
>> > Palo has been an open source project on GitHub (
>> https://github.com/baidu/palo).
>> >
>> > ###Meritocracy
>> >
>> > Palo has been deployed in production at Baidu and is applying more
>>than
>> 200 lines of business. It has demonstrated great performance benefits
>>and
>> has proved to be a better way for reporting and analysis based big data.
>> Still We look forward to growing a rich user and developer community.
>> >
>> > ###Community
>> >
>> > Palo seeks to develop developer and user communities during
>>incubation.
>> >
>> > ###Core Developers
>> >
>> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <ma...@baidu.com>)
>> > * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>> >
>> > ###Alignment
>> >
>> > Palo is related to several other Apache projects:
>> >
>> > * Palo can also read data stored in Apache Hadoop clusters powered by
>> the HDFS filesystem.
>> > * Palo is closely integrated with Impala, which is also being proposed
>> to the Incubator.
>>
>> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>>
>> > * Palo uses Apache Thrift as its RPC and serialization framework of
>> choice.
>> >
>> > ##Known Risks
>> >
>> > ###Orphaned Products
>> >
>> > The core developers of Palo team plan to work full time on this
>>project.
>> There is very little risk of Palo getting orphaned since at least one
>>large
>> company (Baidu) is extensively using it in their production. For
>>example,
>> currently there are more than 200 use cases using Palo in production.
>> Furthermore, since Palo was open sourced at the beginning of October
>>2017,
>> it has received more than 660 stars and been forked nearly 170 times. We
>> plan to extend and diversify this community further through Apache.
>> >
>> > ###Inexperience with Open Source
>> >
>> > The core developers are all active users and followers of open source.
>> They are already committers and contributors to the Palo Github project.
>> All have been involved with the source code that has been released
>>under an
>> open source license, and several of them also have experience developing
>> code in an open source environment. Though the core set of Developers do
>> not have Apache Open Source experience, there are plans to onboard
>> individuals with Apache open source experience on to the project.
>> >
>> > ###Homogenous Developers
>> >
>> > The most of core developers are from Baidu, but after Palo was open
>> sourced, Palo received a lot of bug fixes and enhancements from other
>> developers not working at Baidu.
>> >
>> > ###Reliance on Salaried Developers
>> >
>> > Baidu invested in Palo as the OLAP solution and some of its key
>> engineers are working full time on the project. In addition, since
>>there is
>> a growing Big Data need for scalable OLAP solutions, we look forward to
>> other Apache developers and researchers to contribute to the project.
>>Also
>> key to addressing the risk associated with relying on Salaried
>>developers
>> from a single entity is to increase the diversity of the contributors
>>and
>> actively lobby for Domain experts in the BI space to contribute. Apache
>> Palo intends to do this.
>> >
>> > ###An Excessive Fascination with the Apache Brand
>> >
>> > Palo is proposing to enter incubation at Apache in order to help
>>efforts
>> to diversify the committer-base, not so much to capitalize on the Apache
>> brand. The Palo project is in production use already inside Baidu, but
>>is
>> not expected to be an Baidu product for external customers. As such, the
>> Palo project is not seeking to use the Apache brand as a marketing tool.
>> >
>> > ##Documentation
>> >
>> > Information about Palo can be found at https://github.com/baidu/palo.
>> The following links provide more information about Palo in open source:
>> >
>> > * Palo wiki site: https://github.com/baidu/palo/wiki
>> > * Codebase at Github: https://github.com/baidu/palo
>> > * Issue Tracking: https://github.com/baidu/palo/issues
>> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> >
>> > ##Initial Source
>> >
>> > Palo has been under development since 2017 by a team of engineers at
>> Baidu Inc. It is currently hosted on Github.com under an Apache license
>>at
>> https://github.com/baidu/palo.
>> >
>> > ##External Dependencies
>> >
>> > Palo has the following external dependencies.
>> >
>> > * Google gflags (BSD)
>> > * Google glog (BSD)
>> > * Apache Thrift (Apache Software License v2.0)
>> > * Apache Commons (Apache Software License v2.0)
>> > * Boost (Boost Software License)
>> > * OpenLdap (OpenLDAP Software License)
>> > * rapidjson (Tencent)
>> > * Google RE2 (BSD-style)
>> > * lz4 (BSD)
>> > * snappy (BSD)
>> > * cyrus-sasl (CMU License)
>> > * Twitter Bootstrap (Apache Software License v2.0)
>> > * d3 (BSD)
>> > * LLVM (BSD-like)
>> >
>> > Build and test dependencies:
>> >
>> > * ant (Apache Software License v2.0)
>> > * Apache Maven (Apache Software License v2.0)
>> > * cmake (BSD)
>> > * clang (BSD)
>> > * Google gtest (Apache Software License v2.0)
>> >
>> > ##Required Resources
>> >
>> > ###Mailing List
>> >
>> > There are currently no mailing lists. The usual mailing lists are
>> expected to be set up when entering incubation:
>> >
>> > private@palo.incubator.apache.org<mailto:private@palo.
>> incubator.apache.org>
>> > dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> > commits@palo.incubator.apache.org<mailto:commits@palo.
>> incubator.apache.org>
>> >
>> > ###Subversion Directory
>> >
>> > Upon entering incubation: https://github.com/baidu/palo.
>> > After incubation, we want to move the existing repo from
>> https://github.com/baidu/palo to Apache infrastructure.
>> >
>> > ###Issue Tracking
>> >
>> > Palo currently uses GitHub to track issues. Would like to continue to
>>do
>> so while we discuss migration possibilities with the ASF Infra
>>committee.
>> >
>> > ###Other Resources
>> >
>> > The existing code already has unit tests so we will make use of
>>existing
>> Apache continuous testing infrastructure. The resulting load should not
>>be
>> very large.
>> >
>> > ##Initial Committers
>> >
>> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
>> ue@baidu.com>)
>> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
>> aa.zhaoc@gmail.com>)
>> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
>> iltolide@sina.com%EF%BC%89>
>> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
>> <ma...@baidu.com>)
>> > * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<mailto:
>> lichaoyong@baidu.com>)
>> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
>> gbinlb@gmail.com>)
>> >
>> > ##Affiliations
>> >
>> > The initial committers are employees of Baidu Inc.. The nominated
>> mentors are employees of TODO.
>> >
>> > ##Sponsors
>> >
>> > ###Champion
>> >
>> > TODO
>> >
>> > ###Nominated Mentors
>> >
>> > * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> > * Luke Han, lukehan@apache.org<ma...@apache.org>
>> > * Zheng Shao, zshao@apache.org<ma...@apache.org>
>>
>> Mentors must be members of the IPMC and almost always Members of the
>>ASF.
>>
>> At this moment only Luke Han is qualified.
>>
>> Regards,
>> Dave
>>
>> >
>> > ###Sponsoring Entity
>> >
>> > We are requesting the Incubator to sponsor this project.
>>
>>

Re: Looking for Champion

Posted by Willem Jiang <wi...@gmail.com>.

Hi,

I'm willing to be the Mentor.
Please count me in.



Willem Jiang

Twitter: willemjiang
Weibo: 姜宁willem

On Fri, Jun 8, 2018 at 8:59 PM, Dave Fisher <da...@comcast.net> wrote:

> Hi -
>
> I’m willing to Champion and Mentor. I have a couple of comments inline.
> I’ll look at dependency licenses later today. It’s early for me.
>
>
> > On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
> >
> > Hi all,
> >
> > I am Reed, as a developer worked with the team for Palo (a MPP-based
> interactive SQL data warehousing).
> > https://github.com/baidu/palo/wiki/Palo-Overview
> >
> > We propose to contribute Palo as an Apache Incubator project, and
> > we are still looking for possible Champion if anyone would like to
> volunteer. Thanks a lot.
> >
> > Best Regards,
> > Reed
> >
> > ===================
> > The draft of the proposal as below:
> >
> > #Apache Palo
> >
> > ##Abstract
> >
> > Palo is a MPP-based interactive SQL data warehousing for reporting and
> analysis.
> >
> > ##Proposal
> >
> > We propose to contribute the Palo codebase and associated artifacts
> (e.g. documentation, web-site content etc.) to the Apache Software
> Foundation with the intent of forming a productive, meritocratic and open
> community around Palo’s continued development, according to the ‘Apache
> Way’.
> >
> > Baidu owns several trademarks regarding Palo, and proposes to transfer
> ownership of those trademarks in full to the ASF.
> >
> > ###Overview of Palo
> >
> > Palo’s implementation consists of two daemons: Frontend (FE) and Backend
> (BE).
> >
> > **Frontend daemon** consists of query coordinator and catalog manager.
> Query coordinator is responsible for receiving users’ sql queries,
> compiling queries and managing queries execution. Catalog manager is
> responsible for managing metadata such as databases, tables, partitions,
> replicas and etc. Several frontend daemons could be deployed to guarantee
> fault-tolerance, and load balancing.
> >
> > **Backend daemon** stores the data and executes the query fragments.
> Many backend daemons could also be deployed to provide scalability and
> fault-tolerance.
> >
> > A typical Palo cluster generally composes of several frontend daemons
> and dozens to hundreds of backend daemons.
> >
> > Users can use MySQL client tools to connect any frontend daemon to
> submit SQL query. Frontend receives the query and compiles it into query
> plans executable by the Backend. Then Frontend sends the query plan
> fragments to Backend. Backend will build a query execution DAG. Data is
> fetched and pipelined into the DAG. The final result response is sent to
> client via Frontend. The distribution of query fragment execution takes
> minimizing data movement and maximizing scan locality as the main goal.
> >
> > ##Background
> >
> > At Baidu, Prior to Palo, different tools were deployed to solve diverse
> requirements in many ways. And when a use case requires the simultaneous
> availability of capabilities that cannot all be provided by a single tool,
> users were forced to build hybrid architectures that stitch multiple tools
> together, but we believe that they shouldn’t need to accept such inherent
> complexity. A storage system built to provide great performance across a
> broad range of workloads provides a more elegant solution to the problems
> that hybrid architectures aim to solve. Palo is the solution.
> >
> > Palo is designed to be a simple and single tightly coupled system, not
> depending on other systems. Palo provides high concurrent low latency point
> query performance, but also provides high throughput queries of ad-hoc
> analysis. Palo provides bulk-batch data loading, but also provides near
> real-time mini-batch data loading. Palo also provides high availability,
> reliability, fault tolerance, and scalability.
> >
> > ##Rationale
> >
> > Palo mainly integrates the technology of Google Mesa and Apache Impala.
> >
> > Mesa is a highly scalable analytic data storage system that stores
> critical measurement data related to Google's Internet advertising
> business. Mesa is designed to satisfy complex and challenging set of users’
> and systems’ requirements, including near real-time data ingestion and
> query ability, as well as high availability, reliability, fault tolerance,
> and scalability for large data and query volumes.
> >
> > Impala is a modern, open-source MPP SQL engine architected from the
> ground up for the Hadoop data processing environment. At present, by virtue
> of its superior performance and rich functionality， Impala has been
> comparable to many commercial MPP database query engine. Mesa can satisfy
> the needs of many of our storage requirements, however Mesa itself does not
> provide a SQL query engine; Impala is a very good MPP SQL query engine, but
> the lack of a perfect distributed storage engine. So in the end we chose
> the combination of these two technologies.
> >
> > Learning from Mesa’s data model, we developed a distributed storage
> engine. Unlike Mesa, this storage engine does not rely on any distributed
> file system. Then we deeply integrate this storage engine with Impala query
> engine. Query compiling, query execution coordination and catalog
> management of storage engine are integrated to be frontend daemon; query
> execution and data storage are integrated to be backend daemon. With this
> integration, we implemented a single, full-featured, high performance state
> the art of MPP database, as well as maintaining the simplicity.
> >
> > ##Current Status
> >
> > Palo has been an open source project on GitHub (
> https://github.com/baidu/palo).
> >
> > ###Meritocracy
> >
> > Palo has been deployed in production at Baidu and is applying more than
> 200 lines of business. It has demonstrated great performance benefits and
> has proved to be a better way for reporting and analysis based big data.
> Still We look forward to growing a rich user and developer community.
> >
> > ###Community
> >
> > Palo seeks to develop developer and user communities during incubation.
> >
> > ###Core Developers
> >
> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
> ue@baidu.com>)
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
> aa.zhaoc@gmail.com>)
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
> iltolide@sina.com%EF%BC%89>
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> <ma...@baidu.com>)
> > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
> lichaoyong@baidu.com>)
> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
> gbinlb@gmail.com>)
> >
> > ###Alignment
> >
> > Palo is related to several other Apache projects:
> >
> > * Palo can also read data stored in Apache Hadoop clusters powered by
> the HDFS filesystem.
> > * Palo is closely integrated with Impala, which is also being proposed
> to the Incubator.
>
> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>
> > * Palo uses Apache Thrift as its RPC and serialization framework of
> choice.
> >
> > ##Known Risks
> >
> > ###Orphaned Products
> >
> > The core developers of Palo team plan to work full time on this project.
> There is very little risk of Palo getting orphaned since at least one large
> company (Baidu) is extensively using it in their production. For example,
> currently there are more than 200 use cases using Palo in production.
> Furthermore, since Palo was open sourced at the beginning of October 2017,
> it has received more than 660 stars and been forked nearly 170 times. We
> plan to extend and diversify this community further through Apache.
> >
> > ###Inexperience with Open Source
> >
> > The core developers are all active users and followers of open source.
> They are already committers and contributors to the Palo Github project.
> All have been involved with the source code that has been released under an
> open source license, and several of them also have experience developing
> code in an open source environment. Though the core set of Developers do
> not have Apache Open Source experience, there are plans to onboard
> individuals with Apache open source experience on to the project.
> >
> > ###Homogenous Developers
> >
> > The most of core developers are from Baidu, but after Palo was open
> sourced, Palo received a lot of bug fixes and enhancements from other
> developers not working at Baidu.
> >
> > ###Reliance on Salaried Developers
> >
> > Baidu invested in Palo as the OLAP solution and some of its key
> engineers are working full time on the project. In addition, since there is
> a growing Big Data need for scalable OLAP solutions, we look forward to
> other Apache developers and researchers to contribute to the project. Also
> key to addressing the risk associated with relying on Salaried developers
> from a single entity is to increase the diversity of the contributors and
> actively lobby for Domain experts in the BI space to contribute. Apache
> Palo intends to do this.
> >
> > ###An Excessive Fascination with the Apache Brand
> >
> > Palo is proposing to enter incubation at Apache in order to help efforts
> to diversify the committer-base, not so much to capitalize on the Apache
> brand. The Palo project is in production use already inside Baidu, but is
> not expected to be an Baidu product for external customers. As such, the
> Palo project is not seeking to use the Apache brand as a marketing tool.
> >
> > ##Documentation
> >
> > Information about Palo can be found at https://github.com/baidu/palo.
> The following links provide more information about Palo in open source:
> >
> > * Palo wiki site: https://github.com/baidu/palo/wiki
> > * Codebase at Github: https://github.com/baidu/palo
> > * Issue Tracking: https://github.com/baidu/palo/issues
> > * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> > * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> >
> > ##Initial Source
> >
> > Palo has been under development since 2017 by a team of engineers at
> Baidu Inc. It is currently hosted on Github.com under an Apache license at
> https://github.com/baidu/palo.
> >
> > ##External Dependencies
> >
> > Palo has the following external dependencies.
> >
> > * Google gflags (BSD)
> > * Google glog (BSD)
> > * Apache Thrift (Apache Software License v2.0)
> > * Apache Commons (Apache Software License v2.0)
> > * Boost (Boost Software License)
> > * OpenLdap (OpenLDAP Software License)
> > * rapidjson (Tencent)
> > * Google RE2 (BSD-style)
> > * lz4 (BSD)
> > * snappy (BSD)
> > * cyrus-sasl (CMU License)
> > * Twitter Bootstrap (Apache Software License v2.0)
> > * d3 (BSD)
> > * LLVM (BSD-like)
> >
> > Build and test dependencies:
> >
> > * ant (Apache Software License v2.0)
> > * Apache Maven (Apache Software License v2.0)
> > * cmake (BSD)
> > * clang (BSD)
> > * Google gtest (Apache Software License v2.0)
> >
> > ##Required Resources
> >
> > ###Mailing List
> >
> > There are currently no mailing lists. The usual mailing lists are
> expected to be set up when entering incubation:
> >
> > private@palo.incubator.apache.org<mailto:private@palo.
> incubator.apache.org>
> > dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> > commits@palo.incubator.apache.org<mailto:commits@palo.
> incubator.apache.org>
> >
> > ###Subversion Directory
> >
> > Upon entering incubation: https://github.com/baidu/palo.
> > After incubation, we want to move the existing repo from
> https://github.com/baidu/palo to Apache infrastructure.
> >
> > ###Issue Tracking
> >
> > Palo currently uses GitHub to track issues. Would like to continue to do
> so while we discuss migration possibilities with the ASF Infra committee.
> >
> > ###Other Resources
> >
> > The existing code already has unit tests so we will make use of existing
> Apache continuous testing infrastructure. The resulting load should not be
> very large.
> >
> > ##Initial Committers
> >
> > * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<mailto:maruy
> ue@baidu.com>)
> > * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<mailto:bu
> aa.zhaoc@gmail.com>)
> > * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:ma
> iltolide@sina.com%EF%BC%89>
> > * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com
> <ma...@baidu.com>)
> > * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<mailto:
> lichaoyong@baidu.com>)
> > * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<mailto:lin
> gbinlb@gmail.com>)
> >
> > ##Affiliations
> >
> > The initial committers are employees of Baidu Inc.. The nominated
> mentors are employees of TODO.
> >
> > ##Sponsors
> >
> > ###Champion
> >
> > TODO
> >
> > ###Nominated Mentors
> >
> > * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> > * Luke Han, lukehan@apache.org<ma...@apache.org>
> > * Zheng Shao, zshao@apache.org<ma...@apache.org>
>
> Mentors must be members of the IPMC and almost always Members of the ASF.
>
> At this moment only Luke Han is qualified.
>
> Regards,
> Dave
>
> >
> > ###Sponsoring Entity
> >
> > We are requesting the Incubator to sponsor this project.
>
>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Dave,

Thank you very much your help and warmly welcome you as Palo’s Champion
and Mentor.
About licenses, we known as far as following:

------
1. aes/* mysql-5.6	GPL v2.1
2. util/mysql_dtoa.cpp Percona Server for MySQL GPL
3. http/mongoose.h mongoose MIT License
------


We will resolve the ASAP.



在 2018/6/8 下午8:59， "Dave Fisher" <da...@comcast.net> 写入:

>Hi -
>
>I’m willing to Champion and Mentor. I have a couple of comments inline.
>I’ll look at dependency licenses later today. It’s early for me.
>
>
>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>> 
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to
>>volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===================
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts
>>(e.g. documentation, web-site content etc.) to the Apache Software
>>Foundation with the intent of forming a productive, meritocratic and
>>open community around Palo’s continued development, according to the
>>‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager.
>>Query coordinator is responsible for receiving users’ sql queries,
>>compiling queries and managing queries execution. Catalog manager is
>>responsible for managing metadata such as databases, tables, partitions,
>>replicas and etc. Several frontend daemons could be deployed to
>>guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments.
>>Many backend daemons could also be deployed to provide scalability and
>>fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons
>>and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to
>>submit SQL query. Frontend receives the query and compiles it into query
>>plans executable by the Backend. Then Frontend sends the query plan
>>fragments to Backend. Backend will build a query execution DAG. Data is
>>fetched and pipelined into the DAG. The final result response is sent to
>>client via Frontend. The distribution of query fragment execution takes
>>minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>requirements in many ways. And when a use case requires the simultaneous
>>availability of capabilities that cannot all be provided by a single
>>tool, users were forced to build hybrid architectures that stitch
>>multiple tools together, but we believe that they shouldn’t need to
>>accept such inherent complexity. A storage system built to provide great
>>performance across a broad range of workloads provides a more elegant
>>solution to the problems that hybrid architectures aim to solve. Palo is
>>the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not
>>depending on other systems. Palo provides high concurrent low latency
>>point query performance, but also provides high throughput queries of
>>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>>provides near real-time mini-batch data loading. Palo also provides high
>>availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores
>>critical measurement data related to Google's Internet advertising
>>business. Mesa is designed to satisfy complex and challenging set of
>>users’ and systems’ requirements, including near real-time data
>>ingestion and query ability, as well as high availability, reliability,
>>fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the
>>ground up for the Hadoop data processing environment. At present, by
>>virtue of its superior performance and rich functionality， Impala has
>>been comparable to many commercial MPP database query engine. Mesa can
>>satisfy the needs of many of our storage requirements, however Mesa
>>itself does not provide a SQL query engine; Impala is a very good MPP
>>SQL query engine, but the lack of a perfect distributed storage engine.
>>So in the end we chose the combination of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage
>>engine. Unlike Mesa, this storage engine does not rely on any
>>distributed file system. Then we deeply integrate this storage engine
>>with Impala query engine. Query compiling, query execution coordination
>>and catalog management of storage engine are integrated to be frontend
>>daemon; query execution and data storage are integrated to be backend
>>daemon. With this integration, we implemented a single, full-featured,
>>high performance state the art of MPP database, as well as maintaining
>>the simplicity.
>> 
>> ##Current Status
>> 
>> Palo has been an open source project on GitHub
>>(https://github.com/baidu/palo).
>> 
>> ###Meritocracy
>> 
>> Palo has been deployed in production at Baidu and is applying more than
>>200 lines of business. It has demonstrated great performance benefits
>>and has proved to be a better way for reporting and analysis based big
>>data. Still We look forward to growing a rich user and developer
>>community.
>> 
>> ###Community
>> 
>> Palo seeks to develop developer and user communities during incubation.
>> 
>> ###Core Developers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ###Alignment
>> 
>> Palo is related to several other Apache projects:
>> 
>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>the HDFS filesystem.
>> * Palo is closely integrated with Impala, which is also being proposed
>>to the Incubator.
>
>Apache Impala has completed Incubation. Jim Apple is VP, Impala.
>
>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>choice.
>> 
>> ##Known Risks
>> 
>> ###Orphaned Products
>> 
>> The core developers of Palo team plan to work full time on this
>>project. There is very little risk of Palo getting orphaned since at
>>least one large company (Baidu) is extensively using it in their
>>production. For example, currently there are more than 200 use cases
>>using Palo in production. Furthermore, since Palo was open sourced at
>>the beginning of October 2017, it has received more than 660 stars and
>>been forked nearly 170 times. We plan to extend and diversify this
>>community further through Apache.
>> 
>> ###Inexperience with Open Source
>> 
>> The core developers are all active users and followers of open source.
>>They are already committers and contributors to the Palo Github project.
>>All have been involved with the source code that has been released under
>>an open source license, and several of them also have experience
>>developing code in an open source environment. Though the core set of
>>Developers do not have Apache Open Source experience, there are plans to
>>onboard individuals with Apache open source experience on to the project.
>> 
>> ###Homogenous Developers
>> 
>> The most of core developers are from Baidu, but after Palo was open
>>sourced, Palo received a lot of bug fixes and enhancements from other
>>developers not working at Baidu.
>> 
>> ###Reliance on Salaried Developers
>> 
>> Baidu invested in Palo as the OLAP solution and some of its key
>>engineers are working full time on the project. In addition, since there
>>is a growing Big Data need for scalable OLAP solutions, we look forward
>>to other Apache developers and researchers to contribute to the project.
>>Also key to addressing the risk associated with relying on Salaried
>>developers from a single entity is to increase the diversity of the
>>contributors and actively lobby for Domain experts in the BI space to
>>contribute. Apache Palo intends to do this.
>> 
>> ###An Excessive Fascination with the Apache Brand
>> 
>> Palo is proposing to enter incubation at Apache in order to help
>>efforts to diversify the committer-base, not so much to capitalize on
>>the Apache brand. The Palo project is in production use already inside
>>Baidu, but is not expected to be an Baidu product for external
>>customers. As such, the Palo project is not seeking to use the Apache
>>brand as a marketing tool.
>> 
>> ##Documentation
>> 
>> Information about Palo can be found at https://github.com/baidu/palo.
>>The following links provide more information about Palo in open source:
>> 
>> * Palo wiki site: https://github.com/baidu/palo/wiki
>> * Codebase at Github: https://github.com/baidu/palo
>> * Issue Tracking: https://github.com/baidu/palo/issues
>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> 
>> ##Initial Source
>> 
>> Palo has been under development since 2017 by a team of engineers at
>>Baidu Inc. It is currently hosted on Github.com under an Apache license
>>at https://github.com/baidu/palo.
>> 
>> ##External Dependencies
>> 
>> Palo has the following external dependencies.
>> 
>> * Google gflags (BSD)
>> * Google glog (BSD)
>> * Apache Thrift (Apache Software License v2.0)
>> * Apache Commons (Apache Software License v2.0)
>> * Boost (Boost Software License)
>> * OpenLdap (OpenLDAP Software License)
>> * rapidjson (Tencent)
>> * Google RE2 (BSD-style)
>> * lz4 (BSD)
>> * snappy (BSD)
>> * cyrus-sasl (CMU License)
>> * Twitter Bootstrap (Apache Software License v2.0)
>> * d3 (BSD)
>> * LLVM (BSD-like)
>> 
>> Build and test dependencies:
>> 
>> * ant (Apache Software License v2.0)
>> * Apache Maven (Apache Software License v2.0)
>> * cmake (BSD)
>> * clang (BSD)
>> * Google gtest (Apache Software License v2.0)
>> 
>> ##Required Resources
>> 
>> ###Mailing List
>> 
>> There are currently no mailing lists. The usual mailing lists are
>>expected to be set up when entering incubation:
>> 
>> 
>>private@palo.incubator.apache.org<mailto:private@palo.incubator.apache.or
>>g>
>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> 
>>commits@palo.incubator.apache.org<mailto:commits@palo.incubator.apache.or
>>g>
>> 
>> ###Subversion Directory
>> 
>> Upon entering incubation: https://github.com/baidu/palo.
>> After incubation, we want to move the existing repo from
>>https://github.com/baidu/palo to Apache infrastructure.
>> 
>> ###Issue Tracking
>> 
>> Palo currently uses GitHub to track issues. Would like to continue to
>>do so while we discuss migration possibilities with the ASF Infra
>>committee.
>> 
>> ###Other Resources
>> 
>> The existing code already has unit tests so we will make use of
>>existing Apache continuous testing infrastructure. The resulting load
>>should not be very large.
>> 
>> ##Initial Committers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ##Affiliations
>> 
>> The initial committers are employees of Baidu Inc.. The nominated
>>mentors are employees of TODO.
>> 
>> ##Sponsors
>> 
>> ###Champion
>> 
>> TODO
>> 
>> ###Nominated Mentors
>> 
>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>
>Mentors must be members of the IPMC and almost always Members of the ASF.
>
>At this moment only Luke Han is qualified.
>
>Regards,
>Dave
>
>> 
>> ###Sponsoring Entity
>> 
>> We are requesting the Incubator to sponsor this project.
>

Re: Looking for Champion

Posted by "Tan,Zhongyi" <ta...@baidu.com>.

great，dave，we will add you as champion.

thanks

> 在 2018年6月8日，20:59，Dave Fisher <da...@comcast.net> 写道：
> 
> Hi -
> 
> I’m willing to Champion and Mentor. I have a couple of comments inline. I’ll look at dependency licenses later today. It’s early for me.
> 
> 
>> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
>> 
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===================
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts (e.g. documentation, web-site content etc.) to the Apache Software Foundation with the intent of forming a productive, meritocratic and open community around Palo’s continued development, according to the ‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager. Query coordinator is responsible for receiving users’ sql queries, compiling queries and managing queries execution. Catalog manager is responsible for managing metadata such as databases, tables, partitions, replicas and etc. Several frontend daemons could be deployed to guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments. Many backend daemons could also be deployed to provide scalability and fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to submit SQL query. Frontend receives the query and compiles it into query plans executable by the Backend. Then Frontend sends the query plan fragments to Backend. Backend will build a query execution DAG. Data is fetched and pipelined into the DAG. The final result response is sent to client via Frontend. The distribution of query fragment execution takes minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse requirements in many ways. And when a use case requires the simultaneous availability of capabilities that cannot all be provided by a single tool, users were forced to build hybrid architectures that stitch multiple tools together, but we believe that they shouldn’t need to accept such inherent complexity. A storage system built to provide great performance across a broad range of workloads provides a more elegant solution to the problems that hybrid architectures aim to solve. Palo is the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo provides bulk-batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy complex and challenging set of users’ and systems’ requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. At present, by virtue of its superior performance and rich functionality， Impala has been comparable to many commercial MPP database query engine. Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a perfect distributed storage engine. So in the end we chose the combination of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage engine. Unlike Mesa, this storage engine does not rely on any distributed file system. Then we deeply integrate this storage engine with Impala query engine. Query compiling, query execution coordination and catalog management of storage engine are integrated to be frontend daemon; query execution and data storage are integrated to be backend daemon. With this integration, we implemented a single, full-featured, high performance state the art of MPP database, as well as maintaining the simplicity.
>> 
>> ##Current Status
>> 
>> Palo has been an open source project on GitHub (https://github.com/baidu/palo).
>> 
>> ###Meritocracy
>> 
>> Palo has been deployed in production at Baidu and is applying more than 200 lines of business. It has demonstrated great performance benefits and has proved to be a better way for reporting and analysis based big data. Still We look forward to growing a rich user and developer community.
>> 
>> ###Community
>> 
>> Palo seeks to develop developer and user communities during incubation.
>> 
>> ###Core Developers
>> 
>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ###Alignment
>> 
>> Palo is related to several other Apache projects:
>> 
>> * Palo can also read data stored in Apache Hadoop clusters powered by the HDFS filesystem.
>> * Palo is closely integrated with Impala, which is also being proposed to the Incubator.
> 
> Apache Impala has completed Incubation. Jim Apple is VP, Impala.
> 
>> * Palo uses Apache Thrift as its RPC and serialization framework of choice.
>> 
>> ##Known Risks
>> 
>> ###Orphaned Products
>> 
>> The core developers of Palo team plan to work full time on this project. There is very little risk of Palo getting orphaned since at least one large company (Baidu) is extensively using it in their production. For example, currently there are more than 200 use cases using Palo in production. Furthermore, since Palo was open sourced at the beginning of October 2017, it has received more than 660 stars and been forked nearly 170 times. We plan to extend and diversify this community further through Apache.
>> 
>> ###Inexperience with Open Source
>> 
>> The core developers are all active users and followers of open source. They are already committers and contributors to the Palo Github project. All have been involved with the source code that has been released under an open source license, and several of them also have experience developing code in an open source environment. Though the core set of Developers do not have Apache Open Source experience, there are plans to onboard individuals with Apache open source experience on to the project.
>> 
>> ###Homogenous Developers
>> 
>> The most of core developers are from Baidu, but after Palo was open sourced, Palo received a lot of bug fixes and enhancements from other developers not working at Baidu.
>> 
>> ###Reliance on Salaried Developers
>> 
>> Baidu invested in Palo as the OLAP solution and some of its key engineers are working full time on the project. In addition, since there is a growing Big Data need for scalable OLAP solutions, we look forward to other Apache developers and researchers to contribute to the project. Also key to addressing the risk associated with relying on Salaried developers from a single entity is to increase the diversity of the contributors and actively lobby for Domain experts in the BI space to contribute. Apache Palo intends to do this.
>> 
>> ###An Excessive Fascination with the Apache Brand
>> 
>> Palo is proposing to enter incubation at Apache in order to help efforts to diversify the committer-base, not so much to capitalize on the Apache brand. The Palo project is in production use already inside Baidu, but is not expected to be an Baidu product for external customers. As such, the Palo project is not seeking to use the Apache brand as a marketing tool.
>> 
>> ##Documentation
>> 
>> Information about Palo can be found at https://github.com/baidu/palo. The following links provide more information about Palo in open source:
>> 
>> * Palo wiki site: https://github.com/baidu/palo/wiki
>> * Codebase at Github: https://github.com/baidu/palo
>> * Issue Tracking: https://github.com/baidu/palo/issues
>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> 
>> ##Initial Source
>> 
>> Palo has been under development since 2017 by a team of engineers at Baidu Inc. It is currently hosted on Github.com under an Apache license at https://github.com/baidu/palo.
>> 
>> ##External Dependencies
>> 
>> Palo has the following external dependencies.
>> 
>> * Google gflags (BSD)
>> * Google glog (BSD)
>> * Apache Thrift (Apache Software License v2.0)
>> * Apache Commons (Apache Software License v2.0)
>> * Boost (Boost Software License)
>> * OpenLdap (OpenLDAP Software License)
>> * rapidjson (Tencent)
>> * Google RE2 (BSD-style)
>> * lz4 (BSD)
>> * snappy (BSD)
>> * cyrus-sasl (CMU License)
>> * Twitter Bootstrap (Apache Software License v2.0)
>> * d3 (BSD)
>> * LLVM (BSD-like)
>> 
>> Build and test dependencies:
>> 
>> * ant (Apache Software License v2.0)
>> * Apache Maven (Apache Software License v2.0)
>> * cmake (BSD)
>> * clang (BSD)
>> * Google gtest (Apache Software License v2.0)
>> 
>> ##Required Resources
>> 
>> ###Mailing List
>> 
>> There are currently no mailing lists. The usual mailing lists are expected to be set up when entering incubation:
>> 
>> private@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> commits@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> 
>> ###Subversion Directory
>> 
>> Upon entering incubation: https://github.com/baidu/palo.
>> After incubation, we want to move the existing repo from https://github.com/baidu/palo to Apache infrastructure.
>> 
>> ###Issue Tracking
>> 
>> Palo currently uses GitHub to track issues. Would like to continue to do so while we discuss migration possibilities with the ASF Infra committee.
>> 
>> ###Other Resources
>> 
>> The existing code already has unit tests so we will make use of existing Apache continuous testing infrastructure. The resulting load should not be very large.
>> 
>> ##Initial Committers
>> 
>> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ##Affiliations
>> 
>> The initial committers are employees of Baidu Inc.. The nominated mentors are employees of TODO.
>> 
>> ##Sponsors
>> 
>> ###Champion
>> 
>> TODO
>> 
>> ###Nominated Mentors
>> 
>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
> 
> Mentors must be members of the IPMC and almost always Members of the ASF.
> 
> At this moment only Luke Han is qualified.
> 
> Regards,
> Dave
> 
>> 
>> ###Sponsoring Entity
>> 
>> We are requesting the Incubator to sponsor this project.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Looking for Champion

Posted by Dave Fisher <da...@comcast.net>.

Hi -

I’m willing to Champion and Mentor. I have a couple of comments inline. I’ll look at dependency licenses later today. It’s early for me.


> On Jun 7, 2018, at 9:45 PM, Li,De(BDG) <li...@baidu.com> wrote:
> 
> Hi all,
> 
> I am Reed, as a developer worked with the team for Palo (a MPP-based interactive SQL data warehousing).
> https://github.com/baidu/palo/wiki/Palo-Overview
> 
> We propose to contribute Palo as an Apache Incubator project, and
> we are still looking for possible Champion if anyone would like to volunteer. Thanks a lot.
> 
> Best Regards,
> Reed
> 
> ===================
> The draft of the proposal as below:
> 
> #Apache Palo
> 
> ##Abstract
> 
> Palo is a MPP-based interactive SQL data warehousing for reporting and analysis.
> 
> ##Proposal
> 
> We propose to contribute the Palo codebase and associated artifacts (e.g. documentation, web-site content etc.) to the Apache Software Foundation with the intent of forming a productive, meritocratic and open community around Palo’s continued development, according to the ‘Apache Way’.
> 
> Baidu owns several trademarks regarding Palo, and proposes to transfer ownership of those trademarks in full to the ASF.
> 
> ###Overview of Palo
> 
> Palo’s implementation consists of two daemons: Frontend (FE) and Backend (BE).
> 
> **Frontend daemon** consists of query coordinator and catalog manager. Query coordinator is responsible for receiving users’ sql queries, compiling queries and managing queries execution. Catalog manager is responsible for managing metadata such as databases, tables, partitions, replicas and etc. Several frontend daemons could be deployed to guarantee fault-tolerance, and load balancing.
> 
> **Backend daemon** stores the data and executes the query fragments. Many backend daemons could also be deployed to provide scalability and fault-tolerance.
> 
> A typical Palo cluster generally composes of several frontend daemons and dozens to hundreds of backend daemons.
> 
> Users can use MySQL client tools to connect any frontend daemon to submit SQL query. Frontend receives the query and compiles it into query plans executable by the Backend. Then Frontend sends the query plan fragments to Backend. Backend will build a query execution DAG. Data is fetched and pipelined into the DAG. The final result response is sent to client via Frontend. The distribution of query fragment execution takes minimizing data movement and maximizing scan locality as the main goal.
> 
> ##Background
> 
> At Baidu, Prior to Palo, different tools were deployed to solve diverse requirements in many ways. And when a use case requires the simultaneous availability of capabilities that cannot all be provided by a single tool, users were forced to build hybrid architectures that stitch multiple tools together, but we believe that they shouldn’t need to accept such inherent complexity. A storage system built to provide great performance across a broad range of workloads provides a more elegant solution to the problems that hybrid architectures aim to solve. Palo is the solution.
> 
> Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo provides bulk-batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability.
> 
> ##Rationale
> 
> Palo mainly integrates the technology of Google Mesa and Apache Impala.
> 
> Mesa is a highly scalable analytic data storage system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy complex and challenging set of users’ and systems’ requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes.
> 
> Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. At present, by virtue of its superior performance and rich functionality， Impala has been comparable to many commercial MPP database query engine. Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a perfect distributed storage engine. So in the end we chose the combination of these two technologies.
> 
> Learning from Mesa’s data model, we developed a distributed storage engine. Unlike Mesa, this storage engine does not rely on any distributed file system. Then we deeply integrate this storage engine with Impala query engine. Query compiling, query execution coordination and catalog management of storage engine are integrated to be frontend daemon; query execution and data storage are integrated to be backend daemon. With this integration, we implemented a single, full-featured, high performance state the art of MPP database, as well as maintaining the simplicity.
> 
> ##Current Status
> 
> Palo has been an open source project on GitHub (https://github.com/baidu/palo).
> 
> ###Meritocracy
> 
> Palo has been deployed in production at Baidu and is applying more than 200 lines of business. It has demonstrated great performance benefits and has proved to be a better way for reporting and analysis based big data. Still We look forward to growing a rich user and developer community.
> 
> ###Community
> 
> Palo seeks to develop developer and user communities during incubation.
> 
> ###Core Developers
> 
> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<ma...@baidu.com>)
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com>)
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>)
> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<ma...@baidu.com>)
> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<ma...@gmail.com>)
> 
> ###Alignment
> 
> Palo is related to several other Apache projects:
> 
> * Palo can also read data stored in Apache Hadoop clusters powered by the HDFS filesystem.
> * Palo is closely integrated with Impala, which is also being proposed to the Incubator.

Apache Impala has completed Incubation. Jim Apple is VP, Impala.

> * Palo uses Apache Thrift as its RPC and serialization framework of choice.
> 
> ##Known Risks
> 
> ###Orphaned Products
> 
> The core developers of Palo team plan to work full time on this project. There is very little risk of Palo getting orphaned since at least one large company (Baidu) is extensively using it in their production. For example, currently there are more than 200 use cases using Palo in production. Furthermore, since Palo was open sourced at the beginning of October 2017, it has received more than 660 stars and been forked nearly 170 times. We plan to extend and diversify this community further through Apache.
> 
> ###Inexperience with Open Source
> 
> The core developers are all active users and followers of open source. They are already committers and contributors to the Palo Github project. All have been involved with the source code that has been released under an open source license, and several of them also have experience developing code in an open source environment. Though the core set of Developers do not have Apache Open Source experience, there are plans to onboard individuals with Apache open source experience on to the project.
> 
> ###Homogenous Developers
> 
> The most of core developers are from Baidu, but after Palo was open sourced, Palo received a lot of bug fixes and enhancements from other developers not working at Baidu.
> 
> ###Reliance on Salaried Developers
> 
> Baidu invested in Palo as the OLAP solution and some of its key engineers are working full time on the project. In addition, since there is a growing Big Data need for scalable OLAP solutions, we look forward to other Apache developers and researchers to contribute to the project. Also key to addressing the risk associated with relying on Salaried developers from a single entity is to increase the diversity of the contributors and actively lobby for Domain experts in the BI space to contribute. Apache Palo intends to do this.
> 
> ###An Excessive Fascination with the Apache Brand
> 
> Palo is proposing to enter incubation at Apache in order to help efforts to diversify the committer-base, not so much to capitalize on the Apache brand. The Palo project is in production use already inside Baidu, but is not expected to be an Baidu product for external customers. As such, the Palo project is not seeking to use the Apache brand as a marketing tool.
> 
> ##Documentation
> 
> Information about Palo can be found at https://github.com/baidu/palo. The following links provide more information about Palo in open source:
> 
> * Palo wiki site: https://github.com/baidu/palo/wiki
> * Codebase at Github: https://github.com/baidu/palo
> * Issue Tracking: https://github.com/baidu/palo/issues
> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> 
> ##Initial Source
> 
> Palo has been under development since 2017 by a team of engineers at Baidu Inc. It is currently hosted on Github.com under an Apache license at https://github.com/baidu/palo.
> 
> ##External Dependencies
> 
> Palo has the following external dependencies.
> 
> * Google gflags (BSD)
> * Google glog (BSD)
> * Apache Thrift (Apache Software License v2.0)
> * Apache Commons (Apache Software License v2.0)
> * Boost (Boost Software License)
> * OpenLdap (OpenLDAP Software License)
> * rapidjson (Tencent)
> * Google RE2 (BSD-style)
> * lz4 (BSD)
> * snappy (BSD)
> * cyrus-sasl (CMU License)
> * Twitter Bootstrap (Apache Software License v2.0)
> * d3 (BSD)
> * LLVM (BSD-like)
> 
> Build and test dependencies:
> 
> * ant (Apache Software License v2.0)
> * Apache Maven (Apache Software License v2.0)
> * cmake (BSD)
> * clang (BSD)
> * Google gtest (Apache Software License v2.0)
> 
> ##Required Resources
> 
> ###Mailing List
> 
> There are currently no mailing lists. The usual mailing lists are expected to be set up when entering incubation:
> 
> private@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> commits@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> 
> ###Subversion Directory
> 
> Upon entering incubation: https://github.com/baidu/palo.
> After incubation, we want to move the existing repo from https://github.com/baidu/palo to Apache infrastructure.
> 
> ###Issue Tracking
> 
> Palo currently uses GitHub to track issues. Would like to continue to do so while we discuss migration possibilities with the ASF Infra committee.
> 
> ###Other Resources
> 
> The existing code already has unit tests so we will make use of existing Apache continuous testing infrastructure. The resulting load should not be very large.
> 
> ##Initial Committers
> 
> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<ma...@baidu.com>)
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com>)
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>)
> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<ma...@baidu.com>)
> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<ma...@gmail.com>)
> 
> ##Affiliations
> 
> The initial committers are employees of Baidu Inc.. The nominated mentors are employees of TODO.
> 
> ##Sponsors
> 
> ###Champion
> 
> TODO
> 
> ###Nominated Mentors
> 
> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> * Luke Han, lukehan@apache.org<ma...@apache.org>
> * Zheng Shao, zshao@apache.org<ma...@apache.org>

Mentors must be members of the IPMC and almost always Members of the ASF.

At this moment only Luke Han is qualified.

Regards,
Dave

> 
> ###Sponsoring Entity
> 
> We are requesting the Incubator to sponsor this project.

Re: Looking for Champion

Posted by Luke Han <lu...@gmail.com>.

I think this should leave to the team who contribute to this project, those
projects could share different purpose but also can work together...if the
architecture is flexible enough.

Jim Apple <jb...@cloudera.com.invalid>于2018年6月19日周二 上午5:39写道：

> Let me respond specifically to a few of these as a way to, I hope,
> inspire the Palo community to reconsider contributing to Impala. It
> could be a great opportunity for us to produce value by keeping the
> query engine working smoothly while the Palo community can focus more
> of their efforts on the storage system. There is some analogue here
> with how Impala works on other storage systems.
>
> > Firstly, as a query engine for Hadoop, Impala deeply depend on HDFS and
> > HBase
> > (At least several years ago it was like this)
>
> Impala can run on other storage. See, for instance
> http://impala.apache.org/docs/build/html/topics/impala_kudu.html and
> http://impala.apache.org/docs/build/html/topics/impala_isilon.html
>
> > Secondly, due to introduced Mesa data model. The Catalog is different
> from
> > Impala.
> > We developped a In-Memory Catalog and also support Rollup, aggregation
> > data
> > model. As a consequnce, we have to change sql grammar based on Impala.
>
> Impala supports catalog data cached in memory, and adding new features
> to Impala's SQL grammar is not forbidden. I think one of my first
> largish contributions changed the grammar.
>
> > Thirdly, it is a big difference in Cluster manager and node deployment.
> > Contrast Impala, Query compiling, query execution coordination and
> catalog
> > management of storage engine are integrated to be frontend daemon.
> > Query execution and data storage are integrated to be backend daemon.
>
> I'm not sure I understand - how is Palo different here?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>
>

Re: Looking for Champion

Posted by Jim Apple <jb...@cloudera.com.INVALID>.

Let me respond specifically to a few of these as a way to, I hope,
inspire the Palo community to reconsider contributing to Impala. It
could be a great opportunity for us to produce value by keeping the
query engine working smoothly while the Palo community can focus more
of their efforts on the storage system. There is some analogue here
with how Impala works on other storage systems.

> Firstly, as a query engine for Hadoop, Impala deeply depend on HDFS and
> HBase
> (At least several years ago it was like this)

Impala can run on other storage. See, for instance
http://impala.apache.org/docs/build/html/topics/impala_kudu.html and
http://impala.apache.org/docs/build/html/topics/impala_isilon.html

> Secondly, due to introduced Mesa data model. The Catalog is different from
> Impala.
> We developped a In-Memory Catalog and also support Rollup, aggregation
> data
> model. As a consequnce, we have to change sql grammar based on Impala.

Impala supports catalog data cached in memory, and adding new features
to Impala's SQL grammar is not forbidden. I think one of my first
largish contributions changed the grammar.

> Thirdly, it is a big difference in Cluster manager and node deployment.
> Contrast Impala, Query compiling, query execution coordination and catalog
> management of storage engine are integrated to be frontend daemon.
> Query execution and data storage are integrated to be backend daemon.

I'm not sure I understand - how is Palo different here?

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Tim, Todd,

Thank you for your response.

We are so sorry that we have not contribute any improvements to Impala so
far.
I think we will do that as soon, it is a good opportuniy to us to
participate
in open source community and learn to do things in Apache way.

One of causes is that We think most of our patches may not been accept by
Impala.
Because there is a big difference between Palo and Impala, our patch just
could
apply to Palo.

Firstly, as a query engine for Hadoop, Impala deeply depend on HDFS and
HBase 
(At least several years ago it was like this)
but Palo is just the opposite. We struggle to build a single tool which do
not 
depend on any other system.
The simplicity (of developing, deploying and using) and meeting many data
serving requirements in single system are the main feature of Palo.
So we just want a query engine from Impala rather than others such as
read/write Hive data.

Secondly, due to introduced Mesa data model. The Catalog is different from
Impala.
We developped a In-Memory Catalog and also support Rollup, aggregation
data 
model. As a consequnce, we have to change sql grammar based on Impala.

Thirdly, it is a big difference in Cluster manager and node deployment.
Contrast Impala, Query compiling, query execution coordination and catalog
management of storage engine are integrated to be frontend daemon.
Query execution and data storage are integrated to be backend daemon.

Now, as you mentioned, regarding Impala's goal is to be a full featured
data 
warehouse engine as well, maybe some of Palo's feature also usefull to
Impala. 
If it is possible, we are very happy to contribute code for Impala.
We are very appreciate for Impala community and we are looking forward to
corporate with Impala community in whatever way.

Best Regards,
Reed



在 2018/6/9 上午12:18， "Tim Armstrong" <ta...@cloudera.com> 写入:

>> Meanwhile we found Impala is a very good MPP SQL query engine, so we
>>integrated
>them together.
>
>Palo didn't integrate with Impala, it forked Impala's codebase and
>embedded
>it in its own repository. I don't remember any attempts from the Palo team
>to engage with the Impala community or attempt to work with us to
>contribute any improvements.
>
>It looks like Palo is still pulling in new code from Impala.  E.g. this
>commit includes a bunch of code I wrote as part of IMPALA-3200:
>https://github.com/baidu/palo/commit/2419384e8a211f10e7636afc6d3423700ba22
>b5a#diff-1c501d9a8b5c3d1d1cce48d5e1fb0edf
>
>The code isn't owned by any individual, I contributed it to Apache and
>it's
>free for anyone to do what they want to do with it, but pulling in
>improvements from other projects without any attempt to attribute it or
>contribute improvements back seems contrary to the Apache way.
>
>Anyway, maybe incubation is an opportunity for us to work together, but
>I'd
>hope that if Palo does go into incubation that it will rethink some of the
>practices it's been following.
>
>On Fri, Jun 8, 2018 at 9:12 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
>> On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG) <li...@baidu.com> wrote:
>>
>> > Hi, Jim
>> >
>> > Thank you for your response.
>> > Actually, we start Palo in several years ago, and that time we
>>developed
>> > the storage engine based on Mesa technology.
>> > Meanwhile we found Impala is a very good MPP SQL query engine, so we
>> > integrated them together.
>> >
>>
>> From what I can tell of the Palo source, it's not so much an
>>integration as
>> a copied-and-modified codebase, right? i.e Palo does not use Impala as a
>> dependency, but rather shares a lot of code from the Impala project that
>> has since diverged.
>>
>>
>> >
>> > With this integration, the goal of Palo is to implement a single,
>> > full-featured, mysql protocol compatible data warehousing.
>> >
>>
>> That sounds pretty similar to the goals of the Impala project. Impala
>>isn't
>> MySQL-compatible at the moment but that seems more like a particular
>> feature that could be added rather than a distinct identity of the
>>project.
>> Otherwise, Impala's goal is to be a full featured data warehouse engine
>>as
>> well.
>>
>> Generally Apache has no rules against multiple projects fulfilling
>>similar
>> goals or use cases, even when those projects might compete. However I
>>think
>> it would be relatively unusual to incubate a project that appears to be
>> derived from a fork of an existing project, at least without first
>> considering whether the additional feature set could be contributed
>>back to
>> the existing community.
>>
>> -Todd
>>
>>
>> > 在 2018/6/8 下午1:55， "Jim Apple" <jb...@apache.org> 写入:
>> >
>> > >Hello! As a contributor to Impala, I’d be interested in hearing
>>thoughts
>> > >from the Palo community about integration between Impala and Palo.
>> > >
>> > >For instance, are there any apparent design goals of Impala that the
>> Palo
>> > >community thinks are fundamentally incompatible with Palo?
>> > >
>> > >Thanks,
>> > >Jim
>> > >
>> > >On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote:
>> > >> Hi all,
>> > >>
>> > >> I am Reed, as a developer worked with the team for Palo (a
>>MPP-based
>> > >>interactive SQL data warehousing).
>> > >> https://github.com/baidu/palo/wiki/Palo-Overview
>> > >>
>> > >> We propose to contribute Palo as an Apache Incubator project, and
>> > >> we are still looking for possible Champion if anyone would like to
>> > >>volunteer. Thanks a lot.
>> > >>
>> > >> Best Regards,
>> > >> Reed
>> > >>
>> > >> ===================
>> > >> The draft of the proposal as below:
>> > >>
>> > >> #Apache Palo
>> > >>
>> > >> ##Abstract
>> > >>
>> > >> Palo is a MPP-based interactive SQL data warehousing for reporting
>>and
>> > >>analysis.
>> > >>
>> > >> ##Proposal
>> > >>
>> > >> We propose to contribute the Palo codebase and associated artifacts
>> > >>(e.g. documentation, web-site content etc.) to the Apache Software
>> > >>Foundation with the intent of forming a productive, meritocratic and
>> > >>open community around Palo’s continued development, according to the
>> > >>‘Apache Way’.
>> > >>
>> > >> Baidu owns several trademarks regarding Palo, and proposes to
>>transfer
>> > >>ownership of those trademarks in full to the ASF.
>> > >>
>> > >> ###Overview of Palo
>> > >>
>> > >> Palo’s implementation consists of two daemons: Frontend (FE) and
>> > >>Backend (BE).
>> > >>
>> > >> **Frontend daemon** consists of query coordinator and catalog
>>manager.
>> > >>Query coordinator is responsible for receiving users’ sql queries,
>> > >>compiling queries and managing queries execution. Catalog manager is
>> > >>responsible for managing metadata such as databases, tables,
>> partitions,
>> > >>replicas and etc. Several frontend daemons could be deployed to
>> > >>guarantee fault-tolerance, and load balancing.
>> > >>
>> > >> **Backend daemon** stores the data and executes the query
>>fragments.
>> > >>Many backend daemons could also be deployed to provide scalability
>>and
>> > >>fault-tolerance.
>> > >>
>> > >> A typical Palo cluster generally composes of several frontend
>>daemons
>> > >>and dozens to hundreds of backend daemons.
>> > >>
>> > >> Users can use MySQL client tools to connect any frontend daemon to
>> > >>submit SQL query. Frontend receives the query and compiles it into
>> query
>> > >>plans executable by the Backend. Then Frontend sends the query plan
>> > >>fragments to Backend. Backend will build a query execution DAG.
>>Data is
>> > >>fetched and pipelined into the DAG. The final result response is
>>sent
>> to
>> > >>client via Frontend. The distribution of query fragment execution
>>takes
>> > >>minimizing data movement and maximizing scan locality as the main
>>goal.
>> > >>
>> > >> ##Background
>> > >>
>> > >> At Baidu, Prior to Palo, different tools were deployed to solve
>> diverse
>> > >>requirements in many ways. And when a use case requires the
>> simultaneous
>> > >>availability of capabilities that cannot all be provided by a single
>> > >>tool, users were forced to build hybrid architectures that stitch
>> > >>multiple tools together, but we believe that they shouldn’t need to
>> > >>accept such inherent complexity. A storage system built to provide
>> great
>> > >>performance across a broad range of workloads provides a more
>>elegant
>> > >>solution to the problems that hybrid architectures aim to solve.
>>Palo
>> is
>> > >>the solution.
>> > >>
>> > >> Palo is designed to be a simple and single tightly coupled system,
>>not
>> > >>depending on other systems. Palo provides high concurrent low
>>latency
>> > >>point query performance, but also provides high throughput queries
>>of
>> > >>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>> > >>provides near real-time mini-batch data loading. Palo also provides
>> high
>> > >>availability, reliability, fault tolerance, and scalability.
>> > >>
>> > >> ##Rationale
>> > >>
>> > >> Palo mainly integrates the technology of Google Mesa and Apache
>> Impala.
>> > >>
>> > >> Mesa is a highly scalable analytic data storage system that stores
>> > >>critical measurement data related to Google's Internet advertising
>> > >>business. Mesa is designed to satisfy complex and challenging set of
>> > >>users’ and systems’ requirements, including near real-time data
>> > >>ingestion and query ability, as well as high availability,
>>reliability,
>> > >>fault tolerance, and scalability for large data and query volumes.
>> > >>
>> > >> Impala is a modern, open-source MPP SQL engine architected from the
>> > >>ground up for the Hadoop data processing environment. At present, by
>> > >>virtue of its superior performance and rich functionality， Impala
>>has
>> > >>been comparable to many commercial MPP database query engine. Mesa
>>can
>> > >>satisfy the needs of many of our storage requirements, however Mesa
>> > >>itself does not provide a SQL query engine; Impala is a very good
>>MPP
>> > >>SQL query engine, but the lack of a perfect distributed storage
>>engine.
>> > >>So in the end we chose the combination of these two technologies.
>> > >>
>> > >> Learning from Mesa’s data model, we developed a distributed storage
>> > >>engine. Unlike Mesa, this storage engine does not rely on any
>> > >>distributed file system. Then we deeply integrate this storage
>>engine
>> > >>with Impala query engine. Query compiling, query execution
>>coordination
>> > >>and catalog management of storage engine are integrated to be
>>frontend
>> > >>daemon; query execution and data storage are integrated to be
>>backend
>> > >>daemon. With this integration, we implemented a single,
>>full-featured,
>> > >>high performance state the art of MPP database, as well as
>>maintaining
>> > >>the simplicity.
>> > >>
>> > >> ##Current Status
>> > >>
>> > >> Palo has been an open source project on GitHub
>> > >>(https://github.com/baidu/palo).
>> > >>
>> > >> ###Meritocracy
>> > >>
>> > >> Palo has been deployed in production at Baidu and is applying more
>> than
>> > >>200 lines of business. It has demonstrated great performance
>>benefits
>> > >>and has proved to be a better way for reporting and analysis based
>>big
>> > >>data. Still We look forward to growing a rich user and developer
>> > >>community.
>> > >>
>> > >> ###Community
>> > >>
>> > >> Palo seeks to develop developer and user communities during
>> incubation.
>> > >>
>> > >> ###Core Developers
>> > >>
>> > >> * Ruyue Ma (https://github.com/maruyue,
>> > >>maruyue@baidu.com<ma...@baidu.com>)
>> > >> * Chun Zhao (https://github.com/imay,
>> > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> > >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > >> * De Li（https://github.com/lide-reed,
>> > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> > >> * Hao Chen (https://github.com/chenhao7253886,
>> > >>chenhao16@baidu.com<ma...@baidu.com>)
>> > >> * Chaoyong Li (https://github.com/cyongli,
>> > >>lichaoyong@baidu.com<ma...@baidu.com>)
>> > >> * Bin Lin (https://github.com/lingbin,
>> > >>lingbinlb@gmail.com<ma...@gmail.com>)
>> > >>
>> > >> ###Alignment
>> > >>
>> > >> Palo is related to several other Apache projects:
>> > >>
>> > >> * Palo can also read data stored in Apache Hadoop clusters powered
>>by
>> > >>the HDFS filesystem.
>> > >> * Palo is closely integrated with Impala, which is also being
>>proposed
>> > >>to the Incubator.
>> > >> * Palo uses Apache Thrift as its RPC and serialization framework of
>> > >>choice.
>> > >>
>> > >> ##Known Risks
>> > >>
>> > >> ###Orphaned Products
>> > >>
>> > >> The core developers of Palo team plan to work full time on this
>> > >>project. There is very little risk of Palo getting orphaned since at
>> > >>least one large company (Baidu) is extensively using it in their
>> > >>production. For example, currently there are more than 200 use cases
>> > >>using Palo in production. Furthermore, since Palo was open sourced
>>at
>> > >>the beginning of October 2017, it has received more than 660 stars
>>and
>> > >>been forked nearly 170 times. We plan to extend and diversify this
>> > >>community further through Apache.
>> > >>
>> > >> ###Inexperience with Open Source
>> > >>
>> > >> The core developers are all active users and followers of open
>>source.
>> > >>They are already committers and contributors to the Palo Github
>> project.
>> > >>All have been involved with the source code that has been released
>> under
>> > >>an open source license, and several of them also have experience
>> > >>developing code in an open source environment. Though the core set
>>of
>> > >>Developers do not have Apache Open Source experience, there are
>>plans
>> to
>> > >>onboard individuals with Apache open source experience on to the
>> project.
>> > >>
>> > >> ###Homogenous Developers
>> > >>
>> > >> The most of core developers are from Baidu, but after Palo was open
>> > >>sourced, Palo received a lot of bug fixes and enhancements from
>>other
>> > >>developers not working at Baidu.
>> > >>
>> > >> ###Reliance on Salaried Developers
>> > >>
>> > >> Baidu invested in Palo as the OLAP solution and some of its key
>> > >>engineers are working full time on the project. In addition, since
>> there
>> > >>is a growing Big Data need for scalable OLAP solutions, we look
>>forward
>> > >>to other Apache developers and researchers to contribute to the
>> project.
>> > >>Also key to addressing the risk associated with relying on Salaried
>> > >>developers from a single entity is to increase the diversity of the
>> > >>contributors and actively lobby for Domain experts in the BI space
>>to
>> > >>contribute. Apache Palo intends to do this.
>> > >>
>> > >> ###An Excessive Fascination with the Apache Brand
>> > >>
>> > >> Palo is proposing to enter incubation at Apache in order to help
>> > >>efforts to diversify the committer-base, not so much to capitalize
>>on
>> > >>the Apache brand. The Palo project is in production use already
>>inside
>> > >>Baidu, but is not expected to be an Baidu product for external
>> > >>customers. As such, the Palo project is not seeking to use the
>>Apache
>> > >>brand as a marketing tool.
>> > >>
>> > >> ##Documentation
>> > >>
>> > >> Information about Palo can be found at
>>https://github.com/baidu/palo.
>> > >>The following links provide more information about Palo in open
>>source:
>> > >>
>> > >> * Palo wiki site: https://github.com/baidu/palo/wiki
>> > >> * Codebase at Github: https://github.com/baidu/palo
>> > >> * Issue Tracking: https://github.com/baidu/palo/issues
>> > >> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> > >> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> > >>
>> > >> ##Initial Source
>> > >>
>> > >> Palo has been under development since 2017 by a team of engineers
>>at
>> > >>Baidu Inc. It is currently hosted on Github.com under an Apache
>>license
>> > >>at https://github.com/baidu/palo.
>> > >>
>> > >> ##External Dependencies
>> > >>
>> > >> Palo has the following external dependencies.
>> > >>
>> > >> * Google gflags (BSD)
>> > >> * Google glog (BSD)
>> > >> * Apache Thrift (Apache Software License v2.0)
>> > >> * Apache Commons (Apache Software License v2.0)
>> > >> * Boost (Boost Software License)
>> > >> * OpenLdap (OpenLDAP Software License)
>> > >> * rapidjson (Tencent)
>> > >> * Google RE2 (BSD-style)
>> > >> * lz4 (BSD)
>> > >> * snappy (BSD)
>> > >> * cyrus-sasl (CMU License)
>> > >> * Twitter Bootstrap (Apache Software License v2.0)
>> > >> * d3 (BSD)
>> > >> * LLVM (BSD-like)
>> > >>
>> > >> Build and test dependencies:
>> > >>
>> > >> * ant (Apache Software License v2.0)
>> > >> * Apache Maven (Apache Software License v2.0)
>> > >> * cmake (BSD)
>> > >> * clang (BSD)
>> > >> * Google gtest (Apache Software License v2.0)
>> > >>
>> > >> ##Required Resources
>> > >>
>> > >> ###Mailing List
>> > >>
>> > >> There are currently no mailing lists. The usual mailing lists are
>> > >>expected to be set up when entering incubation:
>> > >>
>> > >>
>> > >>private@palo.incubator.apache.org<mailto:private@
>> > palo.incubator.apache.or
>> > >>g>
>> > >> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> > >>
>> > >>commits@palo.incubator.apache.org<mailto:commits@
>> > palo.incubator.apache.or
>> > >>g>
>> > >>
>> > >> ###Subversion Directory
>> > >>
>> > >> Upon entering incubation: https://github.com/baidu/palo.
>> > >> After incubation, we want to move the existing repo from
>> > >>https://github.com/baidu/palo to Apache infrastructure.
>> > >>
>> > >> ###Issue Tracking
>> > >>
>> > >> Palo currently uses GitHub to track issues. Would like to continue
>>to
>> > >>do so while we discuss migration possibilities with the ASF Infra
>> > >>committee.
>> > >>
>> > >> ###Other Resources
>> > >>
>> > >> The existing code already has unit tests so we will make use of
>> > >>existing Apache continuous testing infrastructure. The resulting
>>load
>> > >>should not be very large.
>> > >>
>> > >> ##Initial Committers
>> > >>
>> > >> * Ruyue Ma (https://github.com/maruyue,
>> > >>maruyue@baidu.com<ma...@baidu.com>)
>> > >> * Chun Zhao (https://github.com/imay,
>> > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> > >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> > >> * De Li（https://github.com/lide-reed,
>> > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> > >> * Hao Chen (https://github.com/chenhao7253886,
>> > >>chenhao16@baidu.com<ma...@baidu.com>)
>> > >> * Chaoyong Li (https://github.com/cyongli,
>> > >>lichaoyong@baidu.com<ma...@baidu.com>)
>> > >> * Bin Lin (https://github.com/lingbin,
>> > >>lingbinlb@gmail.com<ma...@gmail.com>)
>> > >>
>> > >> ##Affiliations
>> > >>
>> > >> The initial committers are employees of Baidu Inc.. The nominated
>> > >>mentors are employees of TODO.
>> > >>
>> > >> ##Sponsors
>> > >>
>> > >> ###Champion
>> > >>
>> > >> TODO
>> > >>
>> > >> ###Nominated Mentors
>> > >>
>> > >> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> > >> * Luke Han, lukehan@apache.org<ma...@apache.org>
>> > >> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>> > >>
>> > >> ###Sponsoring Entity
>> > >>
>> > >> We are requesting the Incubator to sponsor this project.
>> > >>
>> > >
>> > >---------------------------------------------------------------------
>> > >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > >For additional commands, e-mail: general-help@incubator.apache.org
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > For additional commands, e-mail: general-help@incubator.apache.org
>> >
>>
>>
>>
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Dave,

Thank you for your response.

As you mentioned that mongoose.h, it is serious mistake to replace license when updating Apache
license with a automatic script.

I have fixed it as following:
https://github.com/baidu/palo/commit/611afcd125dc136c58d7feb5552c26e9b215878a

By the way, I wonder Palo just use OpenLdap with binary way, is it still have license issue?

Best Regards,
Reed

发件人: Dave Fisher <da...@comcast.net>>
答复: <ge...@incubator.apache.org>>
日期: 2018年6月9日 星期六 上午2:10
至: <ge...@incubator.apache.org>>
主题: Re: Looking for Champion

Yuck. That’s a mess. That is one very large diff.

I see a few files related to AES the were GPL converted to Apache which not allowed.
Copyrights were changed too which is also incorrect.

Changes to this file be/src/http/mongoose.h<https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e#diff-586168bd25cfbf3bc8bc1b52abc4206c> violate license and copyright of Sergey Lyubka

GitHub makes you expand each diff after awhile.

There are dependency licenses that might be issues too.

These licenses have not been evaluated by LEGAL.
* OpenLdap (OpenLDAP Software License)
http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=LICENSE;hb=e5f8117f0ce088d0bd7a8e18ddf37eaa40eb09b1
* rapidjson (Tencent)
Unknown
* cyrus-sasl (CMU License)
https://spdx.org/licenses/MIT-CMU.html
AKA MIT-CMU

Lots of work in evaluating licenses.

On Jun 8, 2018, at 9:46 AM, Ted Dunning <te...@gmail.com>> wrote:

Ouch.

The copyright in question was attached to code from the source code for
mySQL. There is no way that code can be in an Apache project.

Given the cut and paste history, it seems like it will require a very
detailed audit of code history or web searches to find where the original
code came from. The my_aes.c and .h files, for instance, have no hint in
their history that they came from GPL'ed code.

Yeah. Lot’s of oversight.

If we accept this proposal we need a Mentor who has time to help with this mess.

I don’t know that I have the time to lead that effort. Anyone?

Regards,
Dave

On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon <to...@cloudera.com>> wrote:

...

+1. Also briefly browsing the code I found suspicious commits like this
one:

https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e

... in which a GPL license copyright by Oracle was "fixed" to be an Apache
license copyright Baidu.

So if this project does enter incubation I think we should be extra careful
to audit the origins of all of the source code.

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

   Copyrights were changed too which is also incorrect.

Yes, we know that, I have fixed this mistake as following.
https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4180b30bf
b7

As you mentioned, we will recheck and make sure if Open LDAP is necessary
for Palo. 

Best Regards,
Reed


在 2018/6/9 上午4:13， "Ted Dunning" <te...@gmail.com> 写入:

>Open LDAP is a form of copy-left. It requires source code distribution of
>binary packaged versions.
>
>
>
>On Fri, Jun 8, 2018 at 7:10 PM Dave Fisher <da...@comcast.net> wrote:
>
>> Yuck. That’s a mess. That is one very large diff.
>>
>> I see a few files related to AES the were GPL converted to Apache which
>> not allowed.
>> Copyrights were changed too which is also incorrect.
>>
>> Changes to this file be/src/http/mongoose.h
>> 
>><https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54
>>f182e#diff-586168bd25cfbf3bc8bc1b52abc4206c> violate
>> license and copyright of Sergey Lyubka
>>
>> GitHub makes you expand each diff after awhile.
>>
>> There are dependency licenses that might be issues too.
>>
>> These licenses have not been evaluated by LEGAL.
>> * OpenLdap (OpenLDAP Software License)
>>
>> 
>>http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=LICENSE;
>>hb=e5f8117f0ce088d0bd7a8e18ddf37eaa40eb09b1
>> * rapidjson (Tencent)
>> Unknown
>> * cyrus-sasl (CMU License)
>> https://spdx.org/licenses/MIT-CMU.html
>> AKA MIT-CMU
>>
>> Lots of work in evaluating licenses.
>>
>> On Jun 8, 2018, at 9:46 AM, Ted Dunning <te...@gmail.com> wrote:
>>
>> Ouch.
>>
>> The copyright in question was attached to code from the source code for
>> mySQL. There is no way that code can be in an Apache project.
>>
>> Given the cut and paste history, it seems like it will require a very
>> detailed audit of code history or web searches to find where the
>>original
>> code came from. The my_aes.c and .h files, for instance, have no hint in
>> their history that they came from GPL'ed code.
>>
>>
>> Yeah. Lot’s of oversight.
>>
>> If we accept this proposal we need a Mentor who has time to help with
>>this
>> mess.
>>
>> I don’t know that I have the time to lead that effort. Anyone?
>>
>> Regards,
>> Dave
>>
>>
>> On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon <to...@cloudera.com> wrote:
>>
>> ...
>>
>> +1. Also briefly browsing the code I found suspicious commits like this
>> one:
>>
>>
>> 
>>https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f
>>182e
>>
>> ... in which a GPL license copyright by Oracle was "fixed" to be an
>>Apache
>> license copyright Baidu.
>>
>> So if this project does enter incubation I think we should be extra
>>careful
>> to audit the origins of all of the source code.
>>
>>
>>
>>

Re: Looking for Champion

Posted by Ted Dunning <te...@gmail.com>.

Open LDAP is a form of copy-left. It requires source code distribution of
binary packaged versions.



On Fri, Jun 8, 2018 at 7:10 PM Dave Fisher <da...@comcast.net> wrote:

> Yuck. That’s a mess. That is one very large diff.
>
> I see a few files related to AES the were GPL converted to Apache which
> not allowed.
> Copyrights were changed too which is also incorrect.
>
> Changes to this file be/src/http/mongoose.h
> <https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e#diff-586168bd25cfbf3bc8bc1b52abc4206c> violate
> license and copyright of Sergey Lyubka
>
> GitHub makes you expand each diff after awhile.
>
> There are dependency licenses that might be issues too.
>
> These licenses have not been evaluated by LEGAL.
> * OpenLdap (OpenLDAP Software License)
>
> http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=LICENSE;hb=e5f8117f0ce088d0bd7a8e18ddf37eaa40eb09b1
> * rapidjson (Tencent)
> Unknown
> * cyrus-sasl (CMU License)
> https://spdx.org/licenses/MIT-CMU.html
> AKA MIT-CMU
>
> Lots of work in evaluating licenses.
>
> On Jun 8, 2018, at 9:46 AM, Ted Dunning <te...@gmail.com> wrote:
>
> Ouch.
>
> The copyright in question was attached to code from the source code for
> mySQL. There is no way that code can be in an Apache project.
>
> Given the cut and paste history, it seems like it will require a very
> detailed audit of code history or web searches to find where the original
> code came from. The my_aes.c and .h files, for instance, have no hint in
> their history that they came from GPL'ed code.
>
>
> Yeah. Lot’s of oversight.
>
> If we accept this proposal we need a Mentor who has time to help with this
> mess.
>
> I don’t know that I have the time to lead that effort. Anyone?
>
> Regards,
> Dave
>
>
> On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon <to...@cloudera.com> wrote:
>
> ...
>
> +1. Also briefly browsing the code I found suspicious commits like this
> one:
>
>
> https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e
>
> ... in which a GPL license copyright by Oracle was "fixed" to be an Apache
> license copyright Baidu.
>
> So if this project does enter incubation I think we should be extra careful
> to audit the origins of all of the source code.
>
>
>
>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Regarding Licence's question, we will complete the repair as soon as possible before voting.

发件人: Dave Fisher <da...@comcast.net>>
答复: <ge...@incubator.apache.org>>
日期: 2018年6月9日 星期六 上午2:10
至: <ge...@incubator.apache.org>>
主题: Re: Looking for Champion

Yuck. That’s a mess. That is one very large diff.

I see a few files related to AES the were GPL converted to Apache which not allowed.
Copyrights were changed too which is also incorrect.

GitHub makes you expand each diff after awhile.

There are dependency licenses that might be issues too.

Lots of work in evaluating licenses.

On Jun 8, 2018, at 9:46 AM, Ted Dunning <te...@gmail.com>> wrote:

Ouch.

The copyright in question was attached to code from the source code for
mySQL. There is no way that code can be in an Apache project.

Yeah. Lot’s of oversight.

If we accept this proposal we need a Mentor who has time to help with this mess.

I don’t know that I have the time to lead that effort. Anyone?

Regards,
Dave

On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon <to...@cloudera.com>> wrote:

...

+1. Also briefly browsing the code I found suspicious commits like this
one:

https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e

... in which a GPL license copyright by Oracle was "fixed" to be an Apache
license copyright Baidu.

So if this project does enter incubation I think we should be extra careful
to audit the origins of all of the source code.

Re: Looking for Champion

Posted by Dave Fisher <da...@comcast.net>.

Yuck. That’s a mess. That is one very large diff.

I see a few files related to AES the were GPL converted to Apache which not allowed.
Copyrights were changed too which is also incorrect.

Changes to this file be/src/http/mongoose.h <https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e#diff-586168bd25cfbf3bc8bc1b52abc4206c> violate license and copyright of Sergey Lyubka

GitHub makes you expand each diff after awhile.

There are dependency licenses that might be issues too.

These licenses have not been evaluated by LEGAL.
* OpenLdap (OpenLDAP Software License)
	http://www.openldap.org/devel/gitweb.cgi?p=openldap.git;a=blob;f=LICENSE;hb=e5f8117f0ce088d0bd7a8e18ddf37eaa40eb09b1
* rapidjson (Tencent)
	Unknown
* cyrus-sasl (CMU License)
	https://spdx.org/licenses/MIT-CMU.html
	AKA MIT-CMU

Lots of work in evaluating licenses.

> On Jun 8, 2018, at 9:46 AM, Ted Dunning <te...@gmail.com> wrote:
> 
> Ouch.
> 
> The copyright in question was attached to code from the source code for
> mySQL. There is no way that code can be in an Apache project.
> 
> Given the cut and paste history, it seems like it will require a very
> detailed audit of code history or web searches to find where the original
> code came from. The my_aes.c and .h files, for instance, have no hint in
> their history that they came from GPL'ed code.

Yeah. Lot’s of oversight.

If we accept this proposal we need a Mentor who has time to help with this mess.

I don’t know that I have the time to lead that effort. Anyone?

Regards,
Dave

> 
> On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon <to...@cloudera.com> wrote:
> 
>> ...
>> 
>> +1. Also briefly browsing the code I found suspicious commits like this
>> one:
>> 
>> https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e
>> 
>> ... in which a GPL license copyright by Oracle was "fixed" to be an Apache
>> license copyright Baidu.
>> 
>> So if this project does enter incubation I think we should be extra careful
>> to audit the origins of all of the source code.
>> 
>>

Re: Looking for Champion

Posted by Ted Dunning <te...@gmail.com>.

Ouch.

The copyright in question was attached to code from the source code for
mySQL. There is no way that code can be in an Apache project.

Given the cut and paste history, it seems like it will require a very
detailed audit of code history or web searches to find where the original
code came from. The my_aes.c and .h files, for instance, have no hint in
their history that they came from GPL'ed code.

On Fri, Jun 8, 2018 at 5:37 PM Todd Lipcon <to...@cloudera.com> wrote:

> ...
>
> +1. Also briefly browsing the code I found suspicious commits like this
> one:
>
> https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e
>
> ... in which a GPL license copyright by Oracle was "fixed" to be an Apache
> license copyright Baidu.
>
> So if this project does enter incubation I think we should be extra careful
> to audit the origins of all of the source code.
>
>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi Todd,

Thank you for your response.

It is serious mistake to replace Oracle license to Apache when updating
license with a script.

We have not check carefully, actually, those file no longer been used.
So I removed them and made a new commit.

https://github.com/baidu/palo/commit/ac770c33d445a4c18a0b74f56b28a4180b30bf
b7

Best Regards,
Reed


在 2018/6/9 上午12:37， "Todd Lipcon" <to...@cloudera.com> 写入:

>On Fri, Jun 8, 2018 at 9:18 AM, Tim Armstrong <ta...@cloudera.com>
>wrote:
>
>> > Meanwhile we found Impala is a very good MPP SQL query engine, so we
>> integrated
>> them together.
>>
>> Palo didn't integrate with Impala, it forked Impala's codebase and
>>embedded
>> it in its own repository. I don't remember any attempts from the Palo
>>team
>> to engage with the Impala community or attempt to work with us to
>> contribute any improvements.
>>
>> It looks like Palo is still pulling in new code from Impala.  E.g. this
>> commit includes a bunch of code I wrote as part of IMPALA-3200:
>> https://github.com/baidu/palo/commit/2419384e8a211f10e7636afc6d3423
>> 700ba22b5a#diff-1c501d9a8b5c3d1d1cce48d5e1fb0edf
>>
>> The code isn't owned by any individual, I contributed it to Apache and
>>it's
>> free for anyone to do what they want to do with it, but pulling in
>> improvements from other projects without any attempt to attribute it or
>> contribute improvements back seems contrary to the Apache way.
>>
>
>+1. Also briefly browsing the code I found suspicious commits like this
>one:
>https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f1
>82e
>
>... in which a GPL license copyright by Oracle was "fixed" to be an Apache
>license copyright Baidu.
>
>So if this project does enter incubation I think we should be extra
>careful
>to audit the origins of all of the source code.
>
>-Todd
>
>
>> On Fri, Jun 8, 2018 at 9:12 AM, Todd Lipcon <to...@cloudera.com> wrote:
>>
>> > On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG) <li...@baidu.com> wrote:
>> >
>> > > Hi, Jim
>> > >
>> > > Thank you for your response.
>> > > Actually, we start Palo in several years ago, and that time we
>> developed
>> > > the storage engine based on Mesa technology.
>> > > Meanwhile we found Impala is a very good MPP SQL query engine, so we
>> > > integrated them together.
>> > >
>> >
>> > From what I can tell of the Palo source, it's not so much an
>>integration
>> as
>> > a copied-and-modified codebase, right? i.e Palo does not use Impala
>>as a
>> > dependency, but rather shares a lot of code from the Impala project
>>that
>> > has since diverged.
>> >
>> >
>> > >
>> > > With this integration, the goal of Palo is to implement a single,
>> > > full-featured, mysql protocol compatible data warehousing.
>> > >
>> >
>> > That sounds pretty similar to the goals of the Impala project. Impala
>> isn't
>> > MySQL-compatible at the moment but that seems more like a particular
>> > feature that could be added rather than a distinct identity of the
>> project.
>> > Otherwise, Impala's goal is to be a full featured data warehouse
>>engine
>> as
>> > well.
>> >
>> > Generally Apache has no rules against multiple projects fulfilling
>> similar
>> > goals or use cases, even when those projects might compete. However I
>> think
>> > it would be relatively unusual to incubate a project that appears to
>>be
>> > derived from a fork of an existing project, at least without first
>> > considering whether the additional feature set could be contributed
>>back
>> to
>> > the existing community.
>> >
>> > -Todd
>> >
>> >
>> > > 在 2018/6/8 下午1:55， "Jim Apple" <jb...@apache.org> 写入:
>> > >
>> > > >Hello! As a contributor to Impala, I’d be interested in hearing
>> thoughts
>> > > >from the Palo community about integration between Impala and Palo.
>> > > >
>> > > >For instance, are there any apparent design goals of Impala that
>>the
>> > Palo
>> > > >community thinks are fundamentally incompatible with Palo?
>> > > >
>> > > >Thanks,
>> > > >Jim
>> > > >
>> > > >On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote:
>> > > >> Hi all,
>> > > >>
>> > > >> I am Reed, as a developer worked with the team for Palo (a
>>MPP-based
>> > > >>interactive SQL data warehousing).
>> > > >> https://github.com/baidu/palo/wiki/Palo-Overview
>> > > >>
>> > > >> We propose to contribute Palo as an Apache Incubator project, and
>> > > >> we are still looking for possible Champion if anyone would like
>>to
>> > > >>volunteer. Thanks a lot.
>> > > >>
>> > > >> Best Regards,
>> > > >> Reed
>> > > >>
>> > > >> ===================
>> > > >> The draft of the proposal as below:
>> > > >>
>> > > >> #Apache Palo
>> > > >>
>> > > >> ##Abstract
>> > > >>
>> > > >> Palo is a MPP-based interactive SQL data warehousing for
>>reporting
>> and
>> > > >>analysis.
>> > > >>
>> > > >> ##Proposal
>> > > >>
>> > > >> We propose to contribute the Palo codebase and associated
>>artifacts
>> > > >>(e.g. documentation, web-site content etc.) to the Apache Software
>> > > >>Foundation with the intent of forming a productive, meritocratic
>>and
>> > > >>open community around Palo’s continued development, according to
>>the
>> > > >>‘Apache Way’.
>> > > >>
>> > > >> Baidu owns several trademarks regarding Palo, and proposes to
>> transfer
>> > > >>ownership of those trademarks in full to the ASF.
>> > > >>
>> > > >> ###Overview of Palo
>> > > >>
>> > > >> Palo’s implementation consists of two daemons: Frontend (FE) and
>> > > >>Backend (BE).
>> > > >>
>> > > >> **Frontend daemon** consists of query coordinator and catalog
>> manager.
>> > > >>Query coordinator is responsible for receiving users’ sql queries,
>> > > >>compiling queries and managing queries execution. Catalog manager
>>is
>> > > >>responsible for managing metadata such as databases, tables,
>> > partitions,
>> > > >>replicas and etc. Several frontend daemons could be deployed to
>> > > >>guarantee fault-tolerance, and load balancing.
>> > > >>
>> > > >> **Backend daemon** stores the data and executes the query
>>fragments.
>> > > >>Many backend daemons could also be deployed to provide scalability
>> and
>> > > >>fault-tolerance.
>> > > >>
>> > > >> A typical Palo cluster generally composes of several frontend
>> daemons
>> > > >>and dozens to hundreds of backend daemons.
>> > > >>
>> > > >> Users can use MySQL client tools to connect any frontend daemon
>>to
>> > > >>submit SQL query. Frontend receives the query and compiles it into
>> > query
>> > > >>plans executable by the Backend. Then Frontend sends the query
>>plan
>> > > >>fragments to Backend. Backend will build a query execution DAG.
>>Data
>> is
>> > > >>fetched and pipelined into the DAG. The final result response is
>>sent
>> > to
>> > > >>client via Frontend. The distribution of query fragment execution
>> takes
>> > > >>minimizing data movement and maximizing scan locality as the main
>> goal.
>> > > >>
>> > > >> ##Background
>> > > >>
>> > > >> At Baidu, Prior to Palo, different tools were deployed to solve
>> > diverse
>> > > >>requirements in many ways. And when a use case requires the
>> > simultaneous
>> > > >>availability of capabilities that cannot all be provided by a
>>single
>> > > >>tool, users were forced to build hybrid architectures that stitch
>> > > >>multiple tools together, but we believe that they shouldn’t need
>>to
>> > > >>accept such inherent complexity. A storage system built to provide
>> > great
>> > > >>performance across a broad range of workloads provides a more
>>elegant
>> > > >>solution to the problems that hybrid architectures aim to solve.
>>Palo
>> > is
>> > > >>the solution.
>> > > >>
>> > > >> Palo is designed to be a simple and single tightly coupled
>>system,
>> not
>> > > >>depending on other systems. Palo provides high concurrent low
>>latency
>> > > >>point query performance, but also provides high throughput
>>queries of
>> > > >>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>> > > >>provides near real-time mini-batch data loading. Palo also
>>provides
>> > high
>> > > >>availability, reliability, fault tolerance, and scalability.
>> > > >>
>> > > >> ##Rationale
>> > > >>
>> > > >> Palo mainly integrates the technology of Google Mesa and Apache
>> > Impala.
>> > > >>
>> > > >> Mesa is a highly scalable analytic data storage system that
>>stores
>> > > >>critical measurement data related to Google's Internet advertising
>> > > >>business. Mesa is designed to satisfy complex and challenging set
>>of
>> > > >>users’ and systems’ requirements, including near real-time data
>> > > >>ingestion and query ability, as well as high availability,
>> reliability,
>> > > >>fault tolerance, and scalability for large data and query volumes.
>> > > >>
>> > > >> Impala is a modern, open-source MPP SQL engine architected from
>>the
>> > > >>ground up for the Hadoop data processing environment. At present,
>>by
>> > > >>virtue of its superior performance and rich functionality， Impala
>>has
>> > > >>been comparable to many commercial MPP database query engine. Mesa
>> can
>> > > >>satisfy the needs of many of our storage requirements, however
>>Mesa
>> > > >>itself does not provide a SQL query engine; Impala is a very good
>>MPP
>> > > >>SQL query engine, but the lack of a perfect distributed storage
>> engine.
>> > > >>So in the end we chose the combination of these two technologies.
>> > > >>
>> > > >> Learning from Mesa’s data model, we developed a distributed
>>storage
>> > > >>engine. Unlike Mesa, this storage engine does not rely on any
>> > > >>distributed file system. Then we deeply integrate this storage
>>engine
>> > > >>with Impala query engine. Query compiling, query execution
>> coordination
>> > > >>and catalog management of storage engine are integrated to be
>> frontend
>> > > >>daemon; query execution and data storage are integrated to be
>>backend
>> > > >>daemon. With this integration, we implemented a single,
>> full-featured,
>> > > >>high performance state the art of MPP database, as well as
>> maintaining
>> > > >>the simplicity.
>> > > >>
>> > > >> ##Current Status
>> > > >>
>> > > >> Palo has been an open source project on GitHub
>> > > >>(https://github.com/baidu/palo).
>> > > >>
>> > > >> ###Meritocracy
>> > > >>
>> > > >> Palo has been deployed in production at Baidu and is applying
>>more
>> > than
>> > > >>200 lines of business. It has demonstrated great performance
>>benefits
>> > > >>and has proved to be a better way for reporting and analysis based
>> big
>> > > >>data. Still We look forward to growing a rich user and developer
>> > > >>community.
>> > > >>
>> > > >> ###Community
>> > > >>
>> > > >> Palo seeks to develop developer and user communities during
>> > incubation.
>> > > >>
>> > > >> ###Core Developers
>> > > >>
>> > > >> * Ruyue Ma (https://github.com/maruyue,
>> > > >>maruyue@baidu.com<ma...@baidu.com>)
>> > > >> * Chun Zhao (https://github.com/imay,
>> > > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> > > >> * Mingyu Chen
>>(https://github.com/morningman,chenmingyu@baidu.com)
>> > > >> * De Li（https://github.com/lide-reed,
>> > > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> > > >> * Hao Chen (https://github.com/chenhao7253886,
>> > > >>chenhao16@baidu.com<ma...@baidu.com>)
>> > > >> * Chaoyong Li (https://github.com/cyongli,
>> > > >>lichaoyong@baidu.com<ma...@baidu.com>)
>> > > >> * Bin Lin (https://github.com/lingbin,
>> > > >>lingbinlb@gmail.com<ma...@gmail.com>)
>> > > >>
>> > > >> ###Alignment
>> > > >>
>> > > >> Palo is related to several other Apache projects:
>> > > >>
>> > > >> * Palo can also read data stored in Apache Hadoop clusters
>>powered
>> by
>> > > >>the HDFS filesystem.
>> > > >> * Palo is closely integrated with Impala, which is also being
>> proposed
>> > > >>to the Incubator.
>> > > >> * Palo uses Apache Thrift as its RPC and serialization framework
>>of
>> > > >>choice.
>> > > >>
>> > > >> ##Known Risks
>> > > >>
>> > > >> ###Orphaned Products
>> > > >>
>> > > >> The core developers of Palo team plan to work full time on this
>> > > >>project. There is very little risk of Palo getting orphaned since
>>at
>> > > >>least one large company (Baidu) is extensively using it in their
>> > > >>production. For example, currently there are more than 200 use
>>cases
>> > > >>using Palo in production. Furthermore, since Palo was open
>>sourced at
>> > > >>the beginning of October 2017, it has received more than 660 stars
>> and
>> > > >>been forked nearly 170 times. We plan to extend and diversify this
>> > > >>community further through Apache.
>> > > >>
>> > > >> ###Inexperience with Open Source
>> > > >>
>> > > >> The core developers are all active users and followers of open
>> source.
>> > > >>They are already committers and contributors to the Palo Github
>> > project.
>> > > >>All have been involved with the source code that has been released
>> > under
>> > > >>an open source license, and several of them also have experience
>> > > >>developing code in an open source environment. Though the core
>>set of
>> > > >>Developers do not have Apache Open Source experience, there are
>>plans
>> > to
>> > > >>onboard individuals with Apache open source experience on to the
>> > project.
>> > > >>
>> > > >> ###Homogenous Developers
>> > > >>
>> > > >> The most of core developers are from Baidu, but after Palo was
>>open
>> > > >>sourced, Palo received a lot of bug fixes and enhancements from
>>other
>> > > >>developers not working at Baidu.
>> > > >>
>> > > >> ###Reliance on Salaried Developers
>> > > >>
>> > > >> Baidu invested in Palo as the OLAP solution and some of its key
>> > > >>engineers are working full time on the project. In addition, since
>> > there
>> > > >>is a growing Big Data need for scalable OLAP solutions, we look
>> forward
>> > > >>to other Apache developers and researchers to contribute to the
>> > project.
>> > > >>Also key to addressing the risk associated with relying on
>>Salaried
>> > > >>developers from a single entity is to increase the diversity of
>>the
>> > > >>contributors and actively lobby for Domain experts in the BI
>>space to
>> > > >>contribute. Apache Palo intends to do this.
>> > > >>
>> > > >> ###An Excessive Fascination with the Apache Brand
>> > > >>
>> > > >> Palo is proposing to enter incubation at Apache in order to help
>> > > >>efforts to diversify the committer-base, not so much to
>>capitalize on
>> > > >>the Apache brand. The Palo project is in production use already
>> inside
>> > > >>Baidu, but is not expected to be an Baidu product for external
>> > > >>customers. As such, the Palo project is not seeking to use the
>>Apache
>> > > >>brand as a marketing tool.
>> > > >>
>> > > >> ##Documentation
>> > > >>
>> > > >> Information about Palo can be found at
>> https://github.com/baidu/palo.
>> > > >>The following links provide more information about Palo in open
>> source:
>> > > >>
>> > > >> * Palo wiki site: https://github.com/baidu/palo/wiki
>> > > >> * Codebase at Github: https://github.com/baidu/palo
>> > > >> * Issue Tracking: https://github.com/baidu/palo/issues
>> > > >> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> > > >> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> > > >>
>> > > >> ##Initial Source
>> > > >>
>> > > >> Palo has been under development since 2017 by a team of
>>engineers at
>> > > >>Baidu Inc. It is currently hosted on Github.com under an Apache
>> license
>> > > >>at https://github.com/baidu/palo.
>> > > >>
>> > > >> ##External Dependencies
>> > > >>
>> > > >> Palo has the following external dependencies.
>> > > >>
>> > > >> * Google gflags (BSD)
>> > > >> * Google glog (BSD)
>> > > >> * Apache Thrift (Apache Software License v2.0)
>> > > >> * Apache Commons (Apache Software License v2.0)
>> > > >> * Boost (Boost Software License)
>> > > >> * OpenLdap (OpenLDAP Software License)
>> > > >> * rapidjson (Tencent)
>> > > >> * Google RE2 (BSD-style)
>> > > >> * lz4 (BSD)
>> > > >> * snappy (BSD)
>> > > >> * cyrus-sasl (CMU License)
>> > > >> * Twitter Bootstrap (Apache Software License v2.0)
>> > > >> * d3 (BSD)
>> > > >> * LLVM (BSD-like)
>> > > >>
>> > > >> Build and test dependencies:
>> > > >>
>> > > >> * ant (Apache Software License v2.0)
>> > > >> * Apache Maven (Apache Software License v2.0)
>> > > >> * cmake (BSD)
>> > > >> * clang (BSD)
>> > > >> * Google gtest (Apache Software License v2.0)
>> > > >>
>> > > >> ##Required Resources
>> > > >>
>> > > >> ###Mailing List
>> > > >>
>> > > >> There are currently no mailing lists. The usual mailing lists are
>> > > >>expected to be set up when entering incubation:
>> > > >>
>> > > >>
>> > > >>private@palo.incubator.apache.org<mailto:private@
>> > > palo.incubator.apache.or
>> > > >>g>
>> > > >> 
>>dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> > > >>
>> > > >>commits@palo.incubator.apache.org<mailto:commits@
>> > > palo.incubator.apache.or
>> > > >>g>
>> > > >>
>> > > >> ###Subversion Directory
>> > > >>
>> > > >> Upon entering incubation: https://github.com/baidu/palo.
>> > > >> After incubation, we want to move the existing repo from
>> > > >>https://github.com/baidu/palo to Apache infrastructure.
>> > > >>
>> > > >> ###Issue Tracking
>> > > >>
>> > > >> Palo currently uses GitHub to track issues. Would like to
>>continue
>> to
>> > > >>do so while we discuss migration possibilities with the ASF Infra
>> > > >>committee.
>> > > >>
>> > > >> ###Other Resources
>> > > >>
>> > > >> The existing code already has unit tests so we will make use of
>> > > >>existing Apache continuous testing infrastructure. The resulting
>>load
>> > > >>should not be very large.
>> > > >>
>> > > >> ##Initial Committers
>> > > >>
>> > > >> * Ruyue Ma (https://github.com/maruyue,
>> > > >>maruyue@baidu.com<ma...@baidu.com>)
>> > > >> * Chun Zhao (https://github.com/imay,
>> > > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> > > >> * Mingyu Chen
>>(https://github.com/morningman,chenmingyu@baidu.com)
>> > > >> * De Li（https://github.com/lide-reed,
>> > > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> > > >> * Hao Chen (https://github.com/chenhao7253886,
>> > > >>chenhao16@baidu.com<ma...@baidu.com>)
>> > > >> * Chaoyong Li (https://github.com/cyongli,
>> > > >>lichaoyong@baidu.com<ma...@baidu.com>)
>> > > >> * Bin Lin (https://github.com/lingbin,
>> > > >>lingbinlb@gmail.com<ma...@gmail.com>)
>> > > >>
>> > > >> ##Affiliations
>> > > >>
>> > > >> The initial committers are employees of Baidu Inc.. The nominated
>> > > >>mentors are employees of TODO.
>> > > >>
>> > > >> ##Sponsors
>> > > >>
>> > > >> ###Champion
>> > > >>
>> > > >> TODO
>> > > >>
>> > > >> ###Nominated Mentors
>> > > >>
>> > > >> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> > > >> * Luke Han, lukehan@apache.org<ma...@apache.org>
>> > > >> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>> > > >>
>> > > >> ###Sponsoring Entity
>> > > >>
>> > > >> We are requesting the Incubator to sponsor this project.
>> > > >>
>> > > >
>> > > 
>>>---------------------------------------------------------------------
>> > > >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > > >For additional commands, e-mail: general-help@incubator.apache.org
>> > > >
>> > >
>> > >
>> > > 
>>---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>> > > For additional commands, e-mail: general-help@incubator.apache.org
>> > >
>> >
>> >
>> >
>> > --
>> > Todd Lipcon
>> > Software Engineer, Cloudera
>> >
>>
>
>
>
>-- 
>Todd Lipcon
>Software Engineer, Cloudera

Re: Looking for Champion

Posted by Todd Lipcon <to...@cloudera.com>.

On Fri, Jun 8, 2018 at 9:18 AM, Tim Armstrong <ta...@cloudera.com>
wrote:

> > Meanwhile we found Impala is a very good MPP SQL query engine, so we
> integrated
> them together.
>
> Palo didn't integrate with Impala, it forked Impala's codebase and embedded
> it in its own repository. I don't remember any attempts from the Palo team
> to engage with the Impala community or attempt to work with us to
> contribute any improvements.
>
> It looks like Palo is still pulling in new code from Impala.  E.g. this
> commit includes a bunch of code I wrote as part of IMPALA-3200:
> https://github.com/baidu/palo/commit/2419384e8a211f10e7636afc6d3423
> 700ba22b5a#diff-1c501d9a8b5c3d1d1cce48d5e1fb0edf
>
> The code isn't owned by any individual, I contributed it to Apache and it's
> free for anyone to do what they want to do with it, but pulling in
> improvements from other projects without any attempt to attribute it or
> contribute improvements back seems contrary to the Apache way.
>

+1. Also briefly browsing the code I found suspicious commits like this one:
https://github.com/baidu/palo/commit/6486be64c319fe0beb8c6b4430c1662de54f182e

... in which a GPL license copyright by Oracle was "fixed" to be an Apache
license copyright Baidu.

So if this project does enter incubation I think we should be extra careful
to audit the origins of all of the source code.

-Todd


> On Fri, Jun 8, 2018 at 9:12 AM, Todd Lipcon <to...@cloudera.com> wrote:
>
> > On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG) <li...@baidu.com> wrote:
> >
> > > Hi, Jim
> > >
> > > Thank you for your response.
> > > Actually, we start Palo in several years ago, and that time we
> developed
> > > the storage engine based on Mesa technology.
> > > Meanwhile we found Impala is a very good MPP SQL query engine, so we
> > > integrated them together.
> > >
> >
> > From what I can tell of the Palo source, it's not so much an integration
> as
> > a copied-and-modified codebase, right? i.e Palo does not use Impala as a
> > dependency, but rather shares a lot of code from the Impala project that
> > has since diverged.
> >
> >
> > >
> > > With this integration, the goal of Palo is to implement a single,
> > > full-featured, mysql protocol compatible data warehousing.
> > >
> >
> > That sounds pretty similar to the goals of the Impala project. Impala
> isn't
> > MySQL-compatible at the moment but that seems more like a particular
> > feature that could be added rather than a distinct identity of the
> project.
> > Otherwise, Impala's goal is to be a full featured data warehouse engine
> as
> > well.
> >
> > Generally Apache has no rules against multiple projects fulfilling
> similar
> > goals or use cases, even when those projects might compete. However I
> think
> > it would be relatively unusual to incubate a project that appears to be
> > derived from a fork of an existing project, at least without first
> > considering whether the additional feature set could be contributed back
> to
> > the existing community.
> >
> > -Todd
> >
> >
> > > 在 2018/6/8 下午1:55， "Jim Apple" <jb...@apache.org> 写入:
> > >
> > > >Hello! As a contributor to Impala, I’d be interested in hearing
> thoughts
> > > >from the Palo community about integration between Impala and Palo.
> > > >
> > > >For instance, are there any apparent design goals of Impala that the
> > Palo
> > > >community thinks are fundamentally incompatible with Palo?
> > > >
> > > >Thanks,
> > > >Jim
> > > >
> > > >On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote:
> > > >> Hi all,
> > > >>
> > > >> I am Reed, as a developer worked with the team for Palo (a MPP-based
> > > >>interactive SQL data warehousing).
> > > >> https://github.com/baidu/palo/wiki/Palo-Overview
> > > >>
> > > >> We propose to contribute Palo as an Apache Incubator project, and
> > > >> we are still looking for possible Champion if anyone would like to
> > > >>volunteer. Thanks a lot.
> > > >>
> > > >> Best Regards,
> > > >> Reed
> > > >>
> > > >> ===================
> > > >> The draft of the proposal as below:
> > > >>
> > > >> #Apache Palo
> > > >>
> > > >> ##Abstract
> > > >>
> > > >> Palo is a MPP-based interactive SQL data warehousing for reporting
> and
> > > >>analysis.
> > > >>
> > > >> ##Proposal
> > > >>
> > > >> We propose to contribute the Palo codebase and associated artifacts
> > > >>(e.g. documentation, web-site content etc.) to the Apache Software
> > > >>Foundation with the intent of forming a productive, meritocratic and
> > > >>open community around Palo’s continued development, according to the
> > > >>‘Apache Way’.
> > > >>
> > > >> Baidu owns several trademarks regarding Palo, and proposes to
> transfer
> > > >>ownership of those trademarks in full to the ASF.
> > > >>
> > > >> ###Overview of Palo
> > > >>
> > > >> Palo’s implementation consists of two daemons: Frontend (FE) and
> > > >>Backend (BE).
> > > >>
> > > >> **Frontend daemon** consists of query coordinator and catalog
> manager.
> > > >>Query coordinator is responsible for receiving users’ sql queries,
> > > >>compiling queries and managing queries execution. Catalog manager is
> > > >>responsible for managing metadata such as databases, tables,
> > partitions,
> > > >>replicas and etc. Several frontend daemons could be deployed to
> > > >>guarantee fault-tolerance, and load balancing.
> > > >>
> > > >> **Backend daemon** stores the data and executes the query fragments.
> > > >>Many backend daemons could also be deployed to provide scalability
> and
> > > >>fault-tolerance.
> > > >>
> > > >> A typical Palo cluster generally composes of several frontend
> daemons
> > > >>and dozens to hundreds of backend daemons.
> > > >>
> > > >> Users can use MySQL client tools to connect any frontend daemon to
> > > >>submit SQL query. Frontend receives the query and compiles it into
> > query
> > > >>plans executable by the Backend. Then Frontend sends the query plan
> > > >>fragments to Backend. Backend will build a query execution DAG. Data
> is
> > > >>fetched and pipelined into the DAG. The final result response is sent
> > to
> > > >>client via Frontend. The distribution of query fragment execution
> takes
> > > >>minimizing data movement and maximizing scan locality as the main
> goal.
> > > >>
> > > >> ##Background
> > > >>
> > > >> At Baidu, Prior to Palo, different tools were deployed to solve
> > diverse
> > > >>requirements in many ways. And when a use case requires the
> > simultaneous
> > > >>availability of capabilities that cannot all be provided by a single
> > > >>tool, users were forced to build hybrid architectures that stitch
> > > >>multiple tools together, but we believe that they shouldn’t need to
> > > >>accept such inherent complexity. A storage system built to provide
> > great
> > > >>performance across a broad range of workloads provides a more elegant
> > > >>solution to the problems that hybrid architectures aim to solve. Palo
> > is
> > > >>the solution.
> > > >>
> > > >> Palo is designed to be a simple and single tightly coupled system,
> not
> > > >>depending on other systems. Palo provides high concurrent low latency
> > > >>point query performance, but also provides high throughput queries of
> > > >>ad-hoc analysis. Palo provides bulk-batch data loading, but also
> > > >>provides near real-time mini-batch data loading. Palo also provides
> > high
> > > >>availability, reliability, fault tolerance, and scalability.
> > > >>
> > > >> ##Rationale
> > > >>
> > > >> Palo mainly integrates the technology of Google Mesa and Apache
> > Impala.
> > > >>
> > > >> Mesa is a highly scalable analytic data storage system that stores
> > > >>critical measurement data related to Google's Internet advertising
> > > >>business. Mesa is designed to satisfy complex and challenging set of
> > > >>users’ and systems’ requirements, including near real-time data
> > > >>ingestion and query ability, as well as high availability,
> reliability,
> > > >>fault tolerance, and scalability for large data and query volumes.
> > > >>
> > > >> Impala is a modern, open-source MPP SQL engine architected from the
> > > >>ground up for the Hadoop data processing environment. At present, by
> > > >>virtue of its superior performance and rich functionality， Impala has
> > > >>been comparable to many commercial MPP database query engine. Mesa
> can
> > > >>satisfy the needs of many of our storage requirements, however Mesa
> > > >>itself does not provide a SQL query engine; Impala is a very good MPP
> > > >>SQL query engine, but the lack of a perfect distributed storage
> engine.
> > > >>So in the end we chose the combination of these two technologies.
> > > >>
> > > >> Learning from Mesa’s data model, we developed a distributed storage
> > > >>engine. Unlike Mesa, this storage engine does not rely on any
> > > >>distributed file system. Then we deeply integrate this storage engine
> > > >>with Impala query engine. Query compiling, query execution
> coordination
> > > >>and catalog management of storage engine are integrated to be
> frontend
> > > >>daemon; query execution and data storage are integrated to be backend
> > > >>daemon. With this integration, we implemented a single,
> full-featured,
> > > >>high performance state the art of MPP database, as well as
> maintaining
> > > >>the simplicity.
> > > >>
> > > >> ##Current Status
> > > >>
> > > >> Palo has been an open source project on GitHub
> > > >>(https://github.com/baidu/palo).
> > > >>
> > > >> ###Meritocracy
> > > >>
> > > >> Palo has been deployed in production at Baidu and is applying more
> > than
> > > >>200 lines of business. It has demonstrated great performance benefits
> > > >>and has proved to be a better way for reporting and analysis based
> big
> > > >>data. Still We look forward to growing a rich user and developer
> > > >>community.
> > > >>
> > > >> ###Community
> > > >>
> > > >> Palo seeks to develop developer and user communities during
> > incubation.
> > > >>
> > > >> ###Core Developers
> > > >>
> > > >> * Ruyue Ma (https://github.com/maruyue,
> > > >>maruyue@baidu.com<ma...@baidu.com>)
> > > >> * Chun Zhao (https://github.com/imay,
> > > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
> > > >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > > >> * De Li（https://github.com/lide-reed,
> > > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> > > >> * Hao Chen (https://github.com/chenhao7253886,
> > > >>chenhao16@baidu.com<ma...@baidu.com>)
> > > >> * Chaoyong Li (https://github.com/cyongli,
> > > >>lichaoyong@baidu.com<ma...@baidu.com>)
> > > >> * Bin Lin (https://github.com/lingbin,
> > > >>lingbinlb@gmail.com<ma...@gmail.com>)
> > > >>
> > > >> ###Alignment
> > > >>
> > > >> Palo is related to several other Apache projects:
> > > >>
> > > >> * Palo can also read data stored in Apache Hadoop clusters powered
> by
> > > >>the HDFS filesystem.
> > > >> * Palo is closely integrated with Impala, which is also being
> proposed
> > > >>to the Incubator.
> > > >> * Palo uses Apache Thrift as its RPC and serialization framework of
> > > >>choice.
> > > >>
> > > >> ##Known Risks
> > > >>
> > > >> ###Orphaned Products
> > > >>
> > > >> The core developers of Palo team plan to work full time on this
> > > >>project. There is very little risk of Palo getting orphaned since at
> > > >>least one large company (Baidu) is extensively using it in their
> > > >>production. For example, currently there are more than 200 use cases
> > > >>using Palo in production. Furthermore, since Palo was open sourced at
> > > >>the beginning of October 2017, it has received more than 660 stars
> and
> > > >>been forked nearly 170 times. We plan to extend and diversify this
> > > >>community further through Apache.
> > > >>
> > > >> ###Inexperience with Open Source
> > > >>
> > > >> The core developers are all active users and followers of open
> source.
> > > >>They are already committers and contributors to the Palo Github
> > project.
> > > >>All have been involved with the source code that has been released
> > under
> > > >>an open source license, and several of them also have experience
> > > >>developing code in an open source environment. Though the core set of
> > > >>Developers do not have Apache Open Source experience, there are plans
> > to
> > > >>onboard individuals with Apache open source experience on to the
> > project.
> > > >>
> > > >> ###Homogenous Developers
> > > >>
> > > >> The most of core developers are from Baidu, but after Palo was open
> > > >>sourced, Palo received a lot of bug fixes and enhancements from other
> > > >>developers not working at Baidu.
> > > >>
> > > >> ###Reliance on Salaried Developers
> > > >>
> > > >> Baidu invested in Palo as the OLAP solution and some of its key
> > > >>engineers are working full time on the project. In addition, since
> > there
> > > >>is a growing Big Data need for scalable OLAP solutions, we look
> forward
> > > >>to other Apache developers and researchers to contribute to the
> > project.
> > > >>Also key to addressing the risk associated with relying on Salaried
> > > >>developers from a single entity is to increase the diversity of the
> > > >>contributors and actively lobby for Domain experts in the BI space to
> > > >>contribute. Apache Palo intends to do this.
> > > >>
> > > >> ###An Excessive Fascination with the Apache Brand
> > > >>
> > > >> Palo is proposing to enter incubation at Apache in order to help
> > > >>efforts to diversify the committer-base, not so much to capitalize on
> > > >>the Apache brand. The Palo project is in production use already
> inside
> > > >>Baidu, but is not expected to be an Baidu product for external
> > > >>customers. As such, the Palo project is not seeking to use the Apache
> > > >>brand as a marketing tool.
> > > >>
> > > >> ##Documentation
> > > >>
> > > >> Information about Palo can be found at
> https://github.com/baidu/palo.
> > > >>The following links provide more information about Palo in open
> source:
> > > >>
> > > >> * Palo wiki site: https://github.com/baidu/palo/wiki
> > > >> * Codebase at Github: https://github.com/baidu/palo
> > > >> * Issue Tracking: https://github.com/baidu/palo/issues
> > > >> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> > > >> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> > > >>
> > > >> ##Initial Source
> > > >>
> > > >> Palo has been under development since 2017 by a team of engineers at
> > > >>Baidu Inc. It is currently hosted on Github.com under an Apache
> license
> > > >>at https://github.com/baidu/palo.
> > > >>
> > > >> ##External Dependencies
> > > >>
> > > >> Palo has the following external dependencies.
> > > >>
> > > >> * Google gflags (BSD)
> > > >> * Google glog (BSD)
> > > >> * Apache Thrift (Apache Software License v2.0)
> > > >> * Apache Commons (Apache Software License v2.0)
> > > >> * Boost (Boost Software License)
> > > >> * OpenLdap (OpenLDAP Software License)
> > > >> * rapidjson (Tencent)
> > > >> * Google RE2 (BSD-style)
> > > >> * lz4 (BSD)
> > > >> * snappy (BSD)
> > > >> * cyrus-sasl (CMU License)
> > > >> * Twitter Bootstrap (Apache Software License v2.0)
> > > >> * d3 (BSD)
> > > >> * LLVM (BSD-like)
> > > >>
> > > >> Build and test dependencies:
> > > >>
> > > >> * ant (Apache Software License v2.0)
> > > >> * Apache Maven (Apache Software License v2.0)
> > > >> * cmake (BSD)
> > > >> * clang (BSD)
> > > >> * Google gtest (Apache Software License v2.0)
> > > >>
> > > >> ##Required Resources
> > > >>
> > > >> ###Mailing List
> > > >>
> > > >> There are currently no mailing lists. The usual mailing lists are
> > > >>expected to be set up when entering incubation:
> > > >>
> > > >>
> > > >>private@palo.incubator.apache.org<mailto:private@
> > > palo.incubator.apache.or
> > > >>g>
> > > >> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> > > >>
> > > >>commits@palo.incubator.apache.org<mailto:commits@
> > > palo.incubator.apache.or
> > > >>g>
> > > >>
> > > >> ###Subversion Directory
> > > >>
> > > >> Upon entering incubation: https://github.com/baidu/palo.
> > > >> After incubation, we want to move the existing repo from
> > > >>https://github.com/baidu/palo to Apache infrastructure.
> > > >>
> > > >> ###Issue Tracking
> > > >>
> > > >> Palo currently uses GitHub to track issues. Would like to continue
> to
> > > >>do so while we discuss migration possibilities with the ASF Infra
> > > >>committee.
> > > >>
> > > >> ###Other Resources
> > > >>
> > > >> The existing code already has unit tests so we will make use of
> > > >>existing Apache continuous testing infrastructure. The resulting load
> > > >>should not be very large.
> > > >>
> > > >> ##Initial Committers
> > > >>
> > > >> * Ruyue Ma (https://github.com/maruyue,
> > > >>maruyue@baidu.com<ma...@baidu.com>)
> > > >> * Chun Zhao (https://github.com/imay,
> > > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
> > > >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > > >> * De Li（https://github.com/lide-reed,
> > > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> > > >> * Hao Chen (https://github.com/chenhao7253886,
> > > >>chenhao16@baidu.com<ma...@baidu.com>)
> > > >> * Chaoyong Li (https://github.com/cyongli,
> > > >>lichaoyong@baidu.com<ma...@baidu.com>)
> > > >> * Bin Lin (https://github.com/lingbin,
> > > >>lingbinlb@gmail.com<ma...@gmail.com>)
> > > >>
> > > >> ##Affiliations
> > > >>
> > > >> The initial committers are employees of Baidu Inc.. The nominated
> > > >>mentors are employees of TODO.
> > > >>
> > > >> ##Sponsors
> > > >>
> > > >> ###Champion
> > > >>
> > > >> TODO
> > > >>
> > > >> ###Nominated Mentors
> > > >>
> > > >> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> > > >> * Luke Han, lukehan@apache.org<ma...@apache.org>
> > > >> * Zheng Shao, zshao@apache.org<ma...@apache.org>
> > > >>
> > > >> ###Sponsoring Entity
> > > >>
> > > >> We are requesting the Incubator to sponsor this project.
> > > >>
> > > >
> > > >---------------------------------------------------------------------
> > > >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > >For additional commands, e-mail: general-help@incubator.apache.org
> > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > > For additional commands, e-mail: general-help@incubator.apache.org
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Looking for Champion

Posted by Tim Armstrong <ta...@cloudera.com>.

> Meanwhile we found Impala is a very good MPP SQL query engine, so we integrated
them together.

Palo didn't integrate with Impala, it forked Impala's codebase and embedded
it in its own repository. I don't remember any attempts from the Palo team
to engage with the Impala community or attempt to work with us to
contribute any improvements.

It looks like Palo is still pulling in new code from Impala.  E.g. this
commit includes a bunch of code I wrote as part of IMPALA-3200:
https://github.com/baidu/palo/commit/2419384e8a211f10e7636afc6d3423700ba22b5a#diff-1c501d9a8b5c3d1d1cce48d5e1fb0edf

The code isn't owned by any individual, I contributed it to Apache and it's
free for anyone to do what they want to do with it, but pulling in
improvements from other projects without any attempt to attribute it or
contribute improvements back seems contrary to the Apache way.

Anyway, maybe incubation is an opportunity for us to work together, but I'd
hope that if Palo does go into incubation that it will rethink some of the
practices it's been following.

On Fri, Jun 8, 2018 at 9:12 AM, Todd Lipcon <to...@cloudera.com> wrote:

> On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG) <li...@baidu.com> wrote:
>
> > Hi, Jim
> >
> > Thank you for your response.
> > Actually, we start Palo in several years ago, and that time we developed
> > the storage engine based on Mesa technology.
> > Meanwhile we found Impala is a very good MPP SQL query engine, so we
> > integrated them together.
> >
>
> From what I can tell of the Palo source, it's not so much an integration as
> a copied-and-modified codebase, right? i.e Palo does not use Impala as a
> dependency, but rather shares a lot of code from the Impala project that
> has since diverged.
>
>
> >
> > With this integration, the goal of Palo is to implement a single,
> > full-featured, mysql protocol compatible data warehousing.
> >
>
> That sounds pretty similar to the goals of the Impala project. Impala isn't
> MySQL-compatible at the moment but that seems more like a particular
> feature that could be added rather than a distinct identity of the project.
> Otherwise, Impala's goal is to be a full featured data warehouse engine as
> well.
>
> Generally Apache has no rules against multiple projects fulfilling similar
> goals or use cases, even when those projects might compete. However I think
> it would be relatively unusual to incubate a project that appears to be
> derived from a fork of an existing project, at least without first
> considering whether the additional feature set could be contributed back to
> the existing community.
>
> -Todd
>
>
> > 在 2018/6/8 下午1:55， "Jim Apple" <jb...@apache.org> 写入:
> >
> > >Hello! As a contributor to Impala, I’d be interested in hearing thoughts
> > >from the Palo community about integration between Impala and Palo.
> > >
> > >For instance, are there any apparent design goals of Impala that the
> Palo
> > >community thinks are fundamentally incompatible with Palo?
> > >
> > >Thanks,
> > >Jim
> > >
> > >On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote:
> > >> Hi all,
> > >>
> > >> I am Reed, as a developer worked with the team for Palo (a MPP-based
> > >>interactive SQL data warehousing).
> > >> https://github.com/baidu/palo/wiki/Palo-Overview
> > >>
> > >> We propose to contribute Palo as an Apache Incubator project, and
> > >> we are still looking for possible Champion if anyone would like to
> > >>volunteer. Thanks a lot.
> > >>
> > >> Best Regards,
> > >> Reed
> > >>
> > >> ===================
> > >> The draft of the proposal as below:
> > >>
> > >> #Apache Palo
> > >>
> > >> ##Abstract
> > >>
> > >> Palo is a MPP-based interactive SQL data warehousing for reporting and
> > >>analysis.
> > >>
> > >> ##Proposal
> > >>
> > >> We propose to contribute the Palo codebase and associated artifacts
> > >>(e.g. documentation, web-site content etc.) to the Apache Software
> > >>Foundation with the intent of forming a productive, meritocratic and
> > >>open community around Palo’s continued development, according to the
> > >>‘Apache Way’.
> > >>
> > >> Baidu owns several trademarks regarding Palo, and proposes to transfer
> > >>ownership of those trademarks in full to the ASF.
> > >>
> > >> ###Overview of Palo
> > >>
> > >> Palo’s implementation consists of two daemons: Frontend (FE) and
> > >>Backend (BE).
> > >>
> > >> **Frontend daemon** consists of query coordinator and catalog manager.
> > >>Query coordinator is responsible for receiving users’ sql queries,
> > >>compiling queries and managing queries execution. Catalog manager is
> > >>responsible for managing metadata such as databases, tables,
> partitions,
> > >>replicas and etc. Several frontend daemons could be deployed to
> > >>guarantee fault-tolerance, and load balancing.
> > >>
> > >> **Backend daemon** stores the data and executes the query fragments.
> > >>Many backend daemons could also be deployed to provide scalability and
> > >>fault-tolerance.
> > >>
> > >> A typical Palo cluster generally composes of several frontend daemons
> > >>and dozens to hundreds of backend daemons.
> > >>
> > >> Users can use MySQL client tools to connect any frontend daemon to
> > >>submit SQL query. Frontend receives the query and compiles it into
> query
> > >>plans executable by the Backend. Then Frontend sends the query plan
> > >>fragments to Backend. Backend will build a query execution DAG. Data is
> > >>fetched and pipelined into the DAG. The final result response is sent
> to
> > >>client via Frontend. The distribution of query fragment execution takes
> > >>minimizing data movement and maximizing scan locality as the main goal.
> > >>
> > >> ##Background
> > >>
> > >> At Baidu, Prior to Palo, different tools were deployed to solve
> diverse
> > >>requirements in many ways. And when a use case requires the
> simultaneous
> > >>availability of capabilities that cannot all be provided by a single
> > >>tool, users were forced to build hybrid architectures that stitch
> > >>multiple tools together, but we believe that they shouldn’t need to
> > >>accept such inherent complexity. A storage system built to provide
> great
> > >>performance across a broad range of workloads provides a more elegant
> > >>solution to the problems that hybrid architectures aim to solve. Palo
> is
> > >>the solution.
> > >>
> > >> Palo is designed to be a simple and single tightly coupled system, not
> > >>depending on other systems. Palo provides high concurrent low latency
> > >>point query performance, but also provides high throughput queries of
> > >>ad-hoc analysis. Palo provides bulk-batch data loading, but also
> > >>provides near real-time mini-batch data loading. Palo also provides
> high
> > >>availability, reliability, fault tolerance, and scalability.
> > >>
> > >> ##Rationale
> > >>
> > >> Palo mainly integrates the technology of Google Mesa and Apache
> Impala.
> > >>
> > >> Mesa is a highly scalable analytic data storage system that stores
> > >>critical measurement data related to Google's Internet advertising
> > >>business. Mesa is designed to satisfy complex and challenging set of
> > >>users’ and systems’ requirements, including near real-time data
> > >>ingestion and query ability, as well as high availability, reliability,
> > >>fault tolerance, and scalability for large data and query volumes.
> > >>
> > >> Impala is a modern, open-source MPP SQL engine architected from the
> > >>ground up for the Hadoop data processing environment. At present, by
> > >>virtue of its superior performance and rich functionality， Impala has
> > >>been comparable to many commercial MPP database query engine. Mesa can
> > >>satisfy the needs of many of our storage requirements, however Mesa
> > >>itself does not provide a SQL query engine; Impala is a very good MPP
> > >>SQL query engine, but the lack of a perfect distributed storage engine.
> > >>So in the end we chose the combination of these two technologies.
> > >>
> > >> Learning from Mesa’s data model, we developed a distributed storage
> > >>engine. Unlike Mesa, this storage engine does not rely on any
> > >>distributed file system. Then we deeply integrate this storage engine
> > >>with Impala query engine. Query compiling, query execution coordination
> > >>and catalog management of storage engine are integrated to be frontend
> > >>daemon; query execution and data storage are integrated to be backend
> > >>daemon. With this integration, we implemented a single, full-featured,
> > >>high performance state the art of MPP database, as well as maintaining
> > >>the simplicity.
> > >>
> > >> ##Current Status
> > >>
> > >> Palo has been an open source project on GitHub
> > >>(https://github.com/baidu/palo).
> > >>
> > >> ###Meritocracy
> > >>
> > >> Palo has been deployed in production at Baidu and is applying more
> than
> > >>200 lines of business. It has demonstrated great performance benefits
> > >>and has proved to be a better way for reporting and analysis based big
> > >>data. Still We look forward to growing a rich user and developer
> > >>community.
> > >>
> > >> ###Community
> > >>
> > >> Palo seeks to develop developer and user communities during
> incubation.
> > >>
> > >> ###Core Developers
> > >>
> > >> * Ruyue Ma (https://github.com/maruyue,
> > >>maruyue@baidu.com<ma...@baidu.com>)
> > >> * Chun Zhao (https://github.com/imay,
> > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
> > >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > >> * De Li（https://github.com/lide-reed,
> > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> > >> * Hao Chen (https://github.com/chenhao7253886,
> > >>chenhao16@baidu.com<ma...@baidu.com>)
> > >> * Chaoyong Li (https://github.com/cyongli,
> > >>lichaoyong@baidu.com<ma...@baidu.com>)
> > >> * Bin Lin (https://github.com/lingbin,
> > >>lingbinlb@gmail.com<ma...@gmail.com>)
> > >>
> > >> ###Alignment
> > >>
> > >> Palo is related to several other Apache projects:
> > >>
> > >> * Palo can also read data stored in Apache Hadoop clusters powered by
> > >>the HDFS filesystem.
> > >> * Palo is closely integrated with Impala, which is also being proposed
> > >>to the Incubator.
> > >> * Palo uses Apache Thrift as its RPC and serialization framework of
> > >>choice.
> > >>
> > >> ##Known Risks
> > >>
> > >> ###Orphaned Products
> > >>
> > >> The core developers of Palo team plan to work full time on this
> > >>project. There is very little risk of Palo getting orphaned since at
> > >>least one large company (Baidu) is extensively using it in their
> > >>production. For example, currently there are more than 200 use cases
> > >>using Palo in production. Furthermore, since Palo was open sourced at
> > >>the beginning of October 2017, it has received more than 660 stars and
> > >>been forked nearly 170 times. We plan to extend and diversify this
> > >>community further through Apache.
> > >>
> > >> ###Inexperience with Open Source
> > >>
> > >> The core developers are all active users and followers of open source.
> > >>They are already committers and contributors to the Palo Github
> project.
> > >>All have been involved with the source code that has been released
> under
> > >>an open source license, and several of them also have experience
> > >>developing code in an open source environment. Though the core set of
> > >>Developers do not have Apache Open Source experience, there are plans
> to
> > >>onboard individuals with Apache open source experience on to the
> project.
> > >>
> > >> ###Homogenous Developers
> > >>
> > >> The most of core developers are from Baidu, but after Palo was open
> > >>sourced, Palo received a lot of bug fixes and enhancements from other
> > >>developers not working at Baidu.
> > >>
> > >> ###Reliance on Salaried Developers
> > >>
> > >> Baidu invested in Palo as the OLAP solution and some of its key
> > >>engineers are working full time on the project. In addition, since
> there
> > >>is a growing Big Data need for scalable OLAP solutions, we look forward
> > >>to other Apache developers and researchers to contribute to the
> project.
> > >>Also key to addressing the risk associated with relying on Salaried
> > >>developers from a single entity is to increase the diversity of the
> > >>contributors and actively lobby for Domain experts in the BI space to
> > >>contribute. Apache Palo intends to do this.
> > >>
> > >> ###An Excessive Fascination with the Apache Brand
> > >>
> > >> Palo is proposing to enter incubation at Apache in order to help
> > >>efforts to diversify the committer-base, not so much to capitalize on
> > >>the Apache brand. The Palo project is in production use already inside
> > >>Baidu, but is not expected to be an Baidu product for external
> > >>customers. As such, the Palo project is not seeking to use the Apache
> > >>brand as a marketing tool.
> > >>
> > >> ##Documentation
> > >>
> > >> Information about Palo can be found at https://github.com/baidu/palo.
> > >>The following links provide more information about Palo in open source:
> > >>
> > >> * Palo wiki site: https://github.com/baidu/palo/wiki
> > >> * Codebase at Github: https://github.com/baidu/palo
> > >> * Issue Tracking: https://github.com/baidu/palo/issues
> > >> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> > >> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> > >>
> > >> ##Initial Source
> > >>
> > >> Palo has been under development since 2017 by a team of engineers at
> > >>Baidu Inc. It is currently hosted on Github.com under an Apache license
> > >>at https://github.com/baidu/palo.
> > >>
> > >> ##External Dependencies
> > >>
> > >> Palo has the following external dependencies.
> > >>
> > >> * Google gflags (BSD)
> > >> * Google glog (BSD)
> > >> * Apache Thrift (Apache Software License v2.0)
> > >> * Apache Commons (Apache Software License v2.0)
> > >> * Boost (Boost Software License)
> > >> * OpenLdap (OpenLDAP Software License)
> > >> * rapidjson (Tencent)
> > >> * Google RE2 (BSD-style)
> > >> * lz4 (BSD)
> > >> * snappy (BSD)
> > >> * cyrus-sasl (CMU License)
> > >> * Twitter Bootstrap (Apache Software License v2.0)
> > >> * d3 (BSD)
> > >> * LLVM (BSD-like)
> > >>
> > >> Build and test dependencies:
> > >>
> > >> * ant (Apache Software License v2.0)
> > >> * Apache Maven (Apache Software License v2.0)
> > >> * cmake (BSD)
> > >> * clang (BSD)
> > >> * Google gtest (Apache Software License v2.0)
> > >>
> > >> ##Required Resources
> > >>
> > >> ###Mailing List
> > >>
> > >> There are currently no mailing lists. The usual mailing lists are
> > >>expected to be set up when entering incubation:
> > >>
> > >>
> > >>private@palo.incubator.apache.org<mailto:private@
> > palo.incubator.apache.or
> > >>g>
> > >> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> > >>
> > >>commits@palo.incubator.apache.org<mailto:commits@
> > palo.incubator.apache.or
> > >>g>
> > >>
> > >> ###Subversion Directory
> > >>
> > >> Upon entering incubation: https://github.com/baidu/palo.
> > >> After incubation, we want to move the existing repo from
> > >>https://github.com/baidu/palo to Apache infrastructure.
> > >>
> > >> ###Issue Tracking
> > >>
> > >> Palo currently uses GitHub to track issues. Would like to continue to
> > >>do so while we discuss migration possibilities with the ASF Infra
> > >>committee.
> > >>
> > >> ###Other Resources
> > >>
> > >> The existing code already has unit tests so we will make use of
> > >>existing Apache continuous testing infrastructure. The resulting load
> > >>should not be very large.
> > >>
> > >> ##Initial Committers
> > >>
> > >> * Ruyue Ma (https://github.com/maruyue,
> > >>maruyue@baidu.com<ma...@baidu.com>)
> > >> * Chun Zhao (https://github.com/imay,
> > >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
> > >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> > >> * De Li（https://github.com/lide-reed,
> > >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> > >> * Hao Chen (https://github.com/chenhao7253886,
> > >>chenhao16@baidu.com<ma...@baidu.com>)
> > >> * Chaoyong Li (https://github.com/cyongli,
> > >>lichaoyong@baidu.com<ma...@baidu.com>)
> > >> * Bin Lin (https://github.com/lingbin,
> > >>lingbinlb@gmail.com<ma...@gmail.com>)
> > >>
> > >> ##Affiliations
> > >>
> > >> The initial committers are employees of Baidu Inc.. The nominated
> > >>mentors are employees of TODO.
> > >>
> > >> ##Sponsors
> > >>
> > >> ###Champion
> > >>
> > >> TODO
> > >>
> > >> ###Nominated Mentors
> > >>
> > >> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> > >> * Luke Han, lukehan@apache.org<ma...@apache.org>
> > >> * Zheng Shao, zshao@apache.org<ma...@apache.org>
> > >>
> > >> ###Sponsoring Entity
> > >>
> > >> We are requesting the Incubator to sponsor this project.
> > >>
> > >
> > >---------------------------------------------------------------------
> > >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > >For additional commands, e-mail: general-help@incubator.apache.org
> > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> > For additional commands, e-mail: general-help@incubator.apache.org
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Thanks to Jim's suggestion. We will seriously consider this proposal.
Also, for some of the opinions given by everyone,
the Palo development team will seriously discuss and then give
everyone a unified reply next week.



在 2018/6/9 上午7:41， "Jim Apple" <jb...@cloudera.com> 写入:

>>
>> Generally Apache has no rules against multiple projects fulfilling
>>similar
>> goals or use cases, even when those projects might compete. However I
>>think
>> it would be relatively unusual to incubate a project that appears to be
>> derived from a fork of an existing project, at least without first
>> considering whether the additional feature set could be contributed
>>back to
>> the existing community.
>>
>
>And this is something I'm really excited about. If only the storage system
>part of Palo were contributed to the ASF, and simultaneously the Palo
>community and the Impala community worked together to integrate the query
>engine work of Palo into Impala, then this could provide a lot of benefit
>to users, I think. My hope is that it would eliminate the toil the Palo
>community is engaged in by rebasing Impala changes (as Tim noticed).
>Impala, meanwhile, might benefit from some changes Palo has made, like
>SIMD
>filtering.
>
>This could be a lot of work, but the current system seems to already
>include quite a lot of inefficiency from the duplication.

Re: Looking for Champion

Posted by Jim Apple <jb...@cloudera.com>.

>
> Generally Apache has no rules against multiple projects fulfilling similar
> goals or use cases, even when those projects might compete. However I think
> it would be relatively unusual to incubate a project that appears to be
> derived from a fork of an existing project, at least without first
> considering whether the additional feature set could be contributed back to
> the existing community.
>

And this is something I'm really excited about. If only the storage system
part of Palo were contributed to the ASF, and simultaneously the Palo
community and the Impala community worked together to integrate the query
engine work of Palo into Impala, then this could provide a lot of benefit
to users, I think. My hope is that it would eliminate the toil the Palo
community is engaged in by rebasing Impala changes (as Tim noticed).
Impala, meanwhile, might benefit from some changes Palo has made, like SIMD
filtering.

This could be a lot of work, but the current system seems to already
include quite a lot of inefficiency from the duplication.

Re: Looking for Champion

Posted by Todd Lipcon <to...@cloudera.com>.

On Thu, Jun 7, 2018 at 11:55 PM, Li,De(BDG) <li...@baidu.com> wrote:

> Hi, Jim
>
> Thank you for your response.
> Actually, we start Palo in several years ago, and that time we developed
> the storage engine based on Mesa technology.
> Meanwhile we found Impala is a very good MPP SQL query engine, so we
> integrated them together.
>

From what I can tell of the Palo source, it's not so much an integration as
a copied-and-modified codebase, right? i.e Palo does not use Impala as a
dependency, but rather shares a lot of code from the Impala project that
has since diverged.


>
> With this integration, the goal of Palo is to implement a single,
> full-featured, mysql protocol compatible data warehousing.
>

That sounds pretty similar to the goals of the Impala project. Impala isn't
MySQL-compatible at the moment but that seems more like a particular
feature that could be added rather than a distinct identity of the project.
Otherwise, Impala's goal is to be a full featured data warehouse engine as
well.

Generally Apache has no rules against multiple projects fulfilling similar
goals or use cases, even when those projects might compete. However I think
it would be relatively unusual to incubate a project that appears to be
derived from a fork of an existing project, at least without first
considering whether the additional feature set could be contributed back to
the existing community.

-Todd


> 在 2018/6/8 下午1:55， "Jim Apple" <jb...@apache.org> 写入:
>
> >Hello! As a contributor to Impala, I’d be interested in hearing thoughts
> >from the Palo community about integration between Impala and Palo.
> >
> >For instance, are there any apparent design goals of Impala that the Palo
> >community thinks are fundamentally incompatible with Palo?
> >
> >Thanks,
> >Jim
> >
> >On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote:
> >> Hi all,
> >>
> >> I am Reed, as a developer worked with the team for Palo (a MPP-based
> >>interactive SQL data warehousing).
> >> https://github.com/baidu/palo/wiki/Palo-Overview
> >>
> >> We propose to contribute Palo as an Apache Incubator project, and
> >> we are still looking for possible Champion if anyone would like to
> >>volunteer. Thanks a lot.
> >>
> >> Best Regards,
> >> Reed
> >>
> >> ===================
> >> The draft of the proposal as below:
> >>
> >> #Apache Palo
> >>
> >> ##Abstract
> >>
> >> Palo is a MPP-based interactive SQL data warehousing for reporting and
> >>analysis.
> >>
> >> ##Proposal
> >>
> >> We propose to contribute the Palo codebase and associated artifacts
> >>(e.g. documentation, web-site content etc.) to the Apache Software
> >>Foundation with the intent of forming a productive, meritocratic and
> >>open community around Palo’s continued development, according to the
> >>‘Apache Way’.
> >>
> >> Baidu owns several trademarks regarding Palo, and proposes to transfer
> >>ownership of those trademarks in full to the ASF.
> >>
> >> ###Overview of Palo
> >>
> >> Palo’s implementation consists of two daemons: Frontend (FE) and
> >>Backend (BE).
> >>
> >> **Frontend daemon** consists of query coordinator and catalog manager.
> >>Query coordinator is responsible for receiving users’ sql queries,
> >>compiling queries and managing queries execution. Catalog manager is
> >>responsible for managing metadata such as databases, tables, partitions,
> >>replicas and etc. Several frontend daemons could be deployed to
> >>guarantee fault-tolerance, and load balancing.
> >>
> >> **Backend daemon** stores the data and executes the query fragments.
> >>Many backend daemons could also be deployed to provide scalability and
> >>fault-tolerance.
> >>
> >> A typical Palo cluster generally composes of several frontend daemons
> >>and dozens to hundreds of backend daemons.
> >>
> >> Users can use MySQL client tools to connect any frontend daemon to
> >>submit SQL query. Frontend receives the query and compiles it into query
> >>plans executable by the Backend. Then Frontend sends the query plan
> >>fragments to Backend. Backend will build a query execution DAG. Data is
> >>fetched and pipelined into the DAG. The final result response is sent to
> >>client via Frontend. The distribution of query fragment execution takes
> >>minimizing data movement and maximizing scan locality as the main goal.
> >>
> >> ##Background
> >>
> >> At Baidu, Prior to Palo, different tools were deployed to solve diverse
> >>requirements in many ways. And when a use case requires the simultaneous
> >>availability of capabilities that cannot all be provided by a single
> >>tool, users were forced to build hybrid architectures that stitch
> >>multiple tools together, but we believe that they shouldn’t need to
> >>accept such inherent complexity. A storage system built to provide great
> >>performance across a broad range of workloads provides a more elegant
> >>solution to the problems that hybrid architectures aim to solve. Palo is
> >>the solution.
> >>
> >> Palo is designed to be a simple and single tightly coupled system, not
> >>depending on other systems. Palo provides high concurrent low latency
> >>point query performance, but also provides high throughput queries of
> >>ad-hoc analysis. Palo provides bulk-batch data loading, but also
> >>provides near real-time mini-batch data loading. Palo also provides high
> >>availability, reliability, fault tolerance, and scalability.
> >>
> >> ##Rationale
> >>
> >> Palo mainly integrates the technology of Google Mesa and Apache Impala.
> >>
> >> Mesa is a highly scalable analytic data storage system that stores
> >>critical measurement data related to Google's Internet advertising
> >>business. Mesa is designed to satisfy complex and challenging set of
> >>users’ and systems’ requirements, including near real-time data
> >>ingestion and query ability, as well as high availability, reliability,
> >>fault tolerance, and scalability for large data and query volumes.
> >>
> >> Impala is a modern, open-source MPP SQL engine architected from the
> >>ground up for the Hadoop data processing environment. At present, by
> >>virtue of its superior performance and rich functionality， Impala has
> >>been comparable to many commercial MPP database query engine. Mesa can
> >>satisfy the needs of many of our storage requirements, however Mesa
> >>itself does not provide a SQL query engine; Impala is a very good MPP
> >>SQL query engine, but the lack of a perfect distributed storage engine.
> >>So in the end we chose the combination of these two technologies.
> >>
> >> Learning from Mesa’s data model, we developed a distributed storage
> >>engine. Unlike Mesa, this storage engine does not rely on any
> >>distributed file system. Then we deeply integrate this storage engine
> >>with Impala query engine. Query compiling, query execution coordination
> >>and catalog management of storage engine are integrated to be frontend
> >>daemon; query execution and data storage are integrated to be backend
> >>daemon. With this integration, we implemented a single, full-featured,
> >>high performance state the art of MPP database, as well as maintaining
> >>the simplicity.
> >>
> >> ##Current Status
> >>
> >> Palo has been an open source project on GitHub
> >>(https://github.com/baidu/palo).
> >>
> >> ###Meritocracy
> >>
> >> Palo has been deployed in production at Baidu and is applying more than
> >>200 lines of business. It has demonstrated great performance benefits
> >>and has proved to be a better way for reporting and analysis based big
> >>data. Still We look forward to growing a rich user and developer
> >>community.
> >>
> >> ###Community
> >>
> >> Palo seeks to develop developer and user communities during incubation.
> >>
> >> ###Core Developers
> >>
> >> * Ruyue Ma (https://github.com/maruyue,
> >>maruyue@baidu.com<ma...@baidu.com>)
> >> * Chun Zhao (https://github.com/imay,
> >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
> >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> >> * De Li（https://github.com/lide-reed,
> >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> >> * Hao Chen (https://github.com/chenhao7253886,
> >>chenhao16@baidu.com<ma...@baidu.com>)
> >> * Chaoyong Li (https://github.com/cyongli,
> >>lichaoyong@baidu.com<ma...@baidu.com>)
> >> * Bin Lin (https://github.com/lingbin,
> >>lingbinlb@gmail.com<ma...@gmail.com>)
> >>
> >> ###Alignment
> >>
> >> Palo is related to several other Apache projects:
> >>
> >> * Palo can also read data stored in Apache Hadoop clusters powered by
> >>the HDFS filesystem.
> >> * Palo is closely integrated with Impala, which is also being proposed
> >>to the Incubator.
> >> * Palo uses Apache Thrift as its RPC and serialization framework of
> >>choice.
> >>
> >> ##Known Risks
> >>
> >> ###Orphaned Products
> >>
> >> The core developers of Palo team plan to work full time on this
> >>project. There is very little risk of Palo getting orphaned since at
> >>least one large company (Baidu) is extensively using it in their
> >>production. For example, currently there are more than 200 use cases
> >>using Palo in production. Furthermore, since Palo was open sourced at
> >>the beginning of October 2017, it has received more than 660 stars and
> >>been forked nearly 170 times. We plan to extend and diversify this
> >>community further through Apache.
> >>
> >> ###Inexperience with Open Source
> >>
> >> The core developers are all active users and followers of open source.
> >>They are already committers and contributors to the Palo Github project.
> >>All have been involved with the source code that has been released under
> >>an open source license, and several of them also have experience
> >>developing code in an open source environment. Though the core set of
> >>Developers do not have Apache Open Source experience, there are plans to
> >>onboard individuals with Apache open source experience on to the project.
> >>
> >> ###Homogenous Developers
> >>
> >> The most of core developers are from Baidu, but after Palo was open
> >>sourced, Palo received a lot of bug fixes and enhancements from other
> >>developers not working at Baidu.
> >>
> >> ###Reliance on Salaried Developers
> >>
> >> Baidu invested in Palo as the OLAP solution and some of its key
> >>engineers are working full time on the project. In addition, since there
> >>is a growing Big Data need for scalable OLAP solutions, we look forward
> >>to other Apache developers and researchers to contribute to the project.
> >>Also key to addressing the risk associated with relying on Salaried
> >>developers from a single entity is to increase the diversity of the
> >>contributors and actively lobby for Domain experts in the BI space to
> >>contribute. Apache Palo intends to do this.
> >>
> >> ###An Excessive Fascination with the Apache Brand
> >>
> >> Palo is proposing to enter incubation at Apache in order to help
> >>efforts to diversify the committer-base, not so much to capitalize on
> >>the Apache brand. The Palo project is in production use already inside
> >>Baidu, but is not expected to be an Baidu product for external
> >>customers. As such, the Palo project is not seeking to use the Apache
> >>brand as a marketing tool.
> >>
> >> ##Documentation
> >>
> >> Information about Palo can be found at https://github.com/baidu/palo.
> >>The following links provide more information about Palo in open source:
> >>
> >> * Palo wiki site: https://github.com/baidu/palo/wiki
> >> * Codebase at Github: https://github.com/baidu/palo
> >> * Issue Tracking: https://github.com/baidu/palo/issues
> >> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> >> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> >>
> >> ##Initial Source
> >>
> >> Palo has been under development since 2017 by a team of engineers at
> >>Baidu Inc. It is currently hosted on Github.com under an Apache license
> >>at https://github.com/baidu/palo.
> >>
> >> ##External Dependencies
> >>
> >> Palo has the following external dependencies.
> >>
> >> * Google gflags (BSD)
> >> * Google glog (BSD)
> >> * Apache Thrift (Apache Software License v2.0)
> >> * Apache Commons (Apache Software License v2.0)
> >> * Boost (Boost Software License)
> >> * OpenLdap (OpenLDAP Software License)
> >> * rapidjson (Tencent)
> >> * Google RE2 (BSD-style)
> >> * lz4 (BSD)
> >> * snappy (BSD)
> >> * cyrus-sasl (CMU License)
> >> * Twitter Bootstrap (Apache Software License v2.0)
> >> * d3 (BSD)
> >> * LLVM (BSD-like)
> >>
> >> Build and test dependencies:
> >>
> >> * ant (Apache Software License v2.0)
> >> * Apache Maven (Apache Software License v2.0)
> >> * cmake (BSD)
> >> * clang (BSD)
> >> * Google gtest (Apache Software License v2.0)
> >>
> >> ##Required Resources
> >>
> >> ###Mailing List
> >>
> >> There are currently no mailing lists. The usual mailing lists are
> >>expected to be set up when entering incubation:
> >>
> >>
> >>private@palo.incubator.apache.org<mailto:private@
> palo.incubator.apache.or
> >>g>
> >> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> >>
> >>commits@palo.incubator.apache.org<mailto:commits@
> palo.incubator.apache.or
> >>g>
> >>
> >> ###Subversion Directory
> >>
> >> Upon entering incubation: https://github.com/baidu/palo.
> >> After incubation, we want to move the existing repo from
> >>https://github.com/baidu/palo to Apache infrastructure.
> >>
> >> ###Issue Tracking
> >>
> >> Palo currently uses GitHub to track issues. Would like to continue to
> >>do so while we discuss migration possibilities with the ASF Infra
> >>committee.
> >>
> >> ###Other Resources
> >>
> >> The existing code already has unit tests so we will make use of
> >>existing Apache continuous testing infrastructure. The resulting load
> >>should not be very large.
> >>
> >> ##Initial Committers
> >>
> >> * Ruyue Ma (https://github.com/maruyue,
> >>maruyue@baidu.com<ma...@baidu.com>)
> >> * Chun Zhao (https://github.com/imay,
> >>buaa.zhaoc@gmail.com<ma...@gmail.com>)
> >> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> >> * De Li（https://github.com/lide-reed,
> >>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> >> * Hao Chen (https://github.com/chenhao7253886,
> >>chenhao16@baidu.com<ma...@baidu.com>)
> >> * Chaoyong Li (https://github.com/cyongli,
> >>lichaoyong@baidu.com<ma...@baidu.com>)
> >> * Bin Lin (https://github.com/lingbin,
> >>lingbinlb@gmail.com<ma...@gmail.com>)
> >>
> >> ##Affiliations
> >>
> >> The initial committers are employees of Baidu Inc.. The nominated
> >>mentors are employees of TODO.
> >>
> >> ##Sponsors
> >>
> >> ###Champion
> >>
> >> TODO
> >>
> >> ###Nominated Mentors
> >>
> >> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> >> * Luke Han, lukehan@apache.org<ma...@apache.org>
> >> * Zheng Shao, zshao@apache.org<ma...@apache.org>
> >>
> >> ###Sponsoring Entity
> >>
> >> We are requesting the Incubator to sponsor this project.
> >>
> >
> >---------------------------------------------------------------------
> >To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> >For additional commands, e-mail: general-help@incubator.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
> For additional commands, e-mail: general-help@incubator.apache.org
>



-- 
Todd Lipcon
Software Engineer, Cloudera

Re: Looking for Champion

Posted by "Li,De(BDG)" <li...@baidu.com>.

Hi, Jim

Thank you for your response.
Actually, we start Palo in several years ago, and that time we developed
the storage engine based on Mesa technology.
Meanwhile we found Impala is a very good MPP SQL query engine, so we
integrated them together.

With this integration, the goal of Palo is to implement a single,
full-featured, mysql protocol compatible data warehousing.


Best regards,
Reed

在 2018/6/8 下午1:55， "Jim Apple" <jb...@apache.org> 写入:

>Hello! As a contributor to Impala, I’d be interested in hearing thoughts
>from the Palo community about integration between Impala and Palo.
>
>For instance, are there any apparent design goals of Impala that the Palo
>community thinks are fundamentally incompatible with Palo?
>
>Thanks,
>Jim
>
>On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote:
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to
>>volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===================
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts
>>(e.g. documentation, web-site content etc.) to the Apache Software
>>Foundation with the intent of forming a productive, meritocratic and
>>open community around Palo’s continued development, according to the
>>‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager.
>>Query coordinator is responsible for receiving users’ sql queries,
>>compiling queries and managing queries execution. Catalog manager is
>>responsible for managing metadata such as databases, tables, partitions,
>>replicas and etc. Several frontend daemons could be deployed to
>>guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments.
>>Many backend daemons could also be deployed to provide scalability and
>>fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons
>>and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to
>>submit SQL query. Frontend receives the query and compiles it into query
>>plans executable by the Backend. Then Frontend sends the query plan
>>fragments to Backend. Backend will build a query execution DAG. Data is
>>fetched and pipelined into the DAG. The final result response is sent to
>>client via Frontend. The distribution of query fragment execution takes
>>minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>requirements in many ways. And when a use case requires the simultaneous
>>availability of capabilities that cannot all be provided by a single
>>tool, users were forced to build hybrid architectures that stitch
>>multiple tools together, but we believe that they shouldn’t need to
>>accept such inherent complexity. A storage system built to provide great
>>performance across a broad range of workloads provides a more elegant
>>solution to the problems that hybrid architectures aim to solve. Palo is
>>the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not
>>depending on other systems. Palo provides high concurrent low latency
>>point query performance, but also provides high throughput queries of
>>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>>provides near real-time mini-batch data loading. Palo also provides high
>>availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores
>>critical measurement data related to Google's Internet advertising
>>business. Mesa is designed to satisfy complex and challenging set of
>>users’ and systems’ requirements, including near real-time data
>>ingestion and query ability, as well as high availability, reliability,
>>fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the
>>ground up for the Hadoop data processing environment. At present, by
>>virtue of its superior performance and rich functionality， Impala has
>>been comparable to many commercial MPP database query engine. Mesa can
>>satisfy the needs of many of our storage requirements, however Mesa
>>itself does not provide a SQL query engine; Impala is a very good MPP
>>SQL query engine, but the lack of a perfect distributed storage engine.
>>So in the end we chose the combination of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage
>>engine. Unlike Mesa, this storage engine does not rely on any
>>distributed file system. Then we deeply integrate this storage engine
>>with Impala query engine. Query compiling, query execution coordination
>>and catalog management of storage engine are integrated to be frontend
>>daemon; query execution and data storage are integrated to be backend
>>daemon. With this integration, we implemented a single, full-featured,
>>high performance state the art of MPP database, as well as maintaining
>>the simplicity.
>> 
>> ##Current Status
>> 
>> Palo has been an open source project on GitHub
>>(https://github.com/baidu/palo).
>> 
>> ###Meritocracy
>> 
>> Palo has been deployed in production at Baidu and is applying more than
>>200 lines of business. It has demonstrated great performance benefits
>>and has proved to be a better way for reporting and analysis based big
>>data. Still We look forward to growing a rich user and developer
>>community.
>> 
>> ###Community
>> 
>> Palo seeks to develop developer and user communities during incubation.
>> 
>> ###Core Developers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ###Alignment
>> 
>> Palo is related to several other Apache projects:
>> 
>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>the HDFS filesystem.
>> * Palo is closely integrated with Impala, which is also being proposed
>>to the Incubator.
>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>choice.
>> 
>> ##Known Risks
>> 
>> ###Orphaned Products
>> 
>> The core developers of Palo team plan to work full time on this
>>project. There is very little risk of Palo getting orphaned since at
>>least one large company (Baidu) is extensively using it in their
>>production. For example, currently there are more than 200 use cases
>>using Palo in production. Furthermore, since Palo was open sourced at
>>the beginning of October 2017, it has received more than 660 stars and
>>been forked nearly 170 times. We plan to extend and diversify this
>>community further through Apache.
>> 
>> ###Inexperience with Open Source
>> 
>> The core developers are all active users and followers of open source.
>>They are already committers and contributors to the Palo Github project.
>>All have been involved with the source code that has been released under
>>an open source license, and several of them also have experience
>>developing code in an open source environment. Though the core set of
>>Developers do not have Apache Open Source experience, there are plans to
>>onboard individuals with Apache open source experience on to the project.
>> 
>> ###Homogenous Developers
>> 
>> The most of core developers are from Baidu, but after Palo was open
>>sourced, Palo received a lot of bug fixes and enhancements from other
>>developers not working at Baidu.
>> 
>> ###Reliance on Salaried Developers
>> 
>> Baidu invested in Palo as the OLAP solution and some of its key
>>engineers are working full time on the project. In addition, since there
>>is a growing Big Data need for scalable OLAP solutions, we look forward
>>to other Apache developers and researchers to contribute to the project.
>>Also key to addressing the risk associated with relying on Salaried
>>developers from a single entity is to increase the diversity of the
>>contributors and actively lobby for Domain experts in the BI space to
>>contribute. Apache Palo intends to do this.
>> 
>> ###An Excessive Fascination with the Apache Brand
>> 
>> Palo is proposing to enter incubation at Apache in order to help
>>efforts to diversify the committer-base, not so much to capitalize on
>>the Apache brand. The Palo project is in production use already inside
>>Baidu, but is not expected to be an Baidu product for external
>>customers. As such, the Palo project is not seeking to use the Apache
>>brand as a marketing tool.
>> 
>> ##Documentation
>> 
>> Information about Palo can be found at https://github.com/baidu/palo.
>>The following links provide more information about Palo in open source:
>> 
>> * Palo wiki site: https://github.com/baidu/palo/wiki
>> * Codebase at Github: https://github.com/baidu/palo
>> * Issue Tracking: https://github.com/baidu/palo/issues
>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> 
>> ##Initial Source
>> 
>> Palo has been under development since 2017 by a team of engineers at
>>Baidu Inc. It is currently hosted on Github.com under an Apache license
>>at https://github.com/baidu/palo.
>> 
>> ##External Dependencies
>> 
>> Palo has the following external dependencies.
>> 
>> * Google gflags (BSD)
>> * Google glog (BSD)
>> * Apache Thrift (Apache Software License v2.0)
>> * Apache Commons (Apache Software License v2.0)
>> * Boost (Boost Software License)
>> * OpenLdap (OpenLDAP Software License)
>> * rapidjson (Tencent)
>> * Google RE2 (BSD-style)
>> * lz4 (BSD)
>> * snappy (BSD)
>> * cyrus-sasl (CMU License)
>> * Twitter Bootstrap (Apache Software License v2.0)
>> * d3 (BSD)
>> * LLVM (BSD-like)
>> 
>> Build and test dependencies:
>> 
>> * ant (Apache Software License v2.0)
>> * Apache Maven (Apache Software License v2.0)
>> * cmake (BSD)
>> * clang (BSD)
>> * Google gtest (Apache Software License v2.0)
>> 
>> ##Required Resources
>> 
>> ###Mailing List
>> 
>> There are currently no mailing lists. The usual mailing lists are
>>expected to be set up when entering incubation:
>> 
>> 
>>private@palo.incubator.apache.org<mailto:private@palo.incubator.apache.or
>>g>
>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> 
>>commits@palo.incubator.apache.org<mailto:commits@palo.incubator.apache.or
>>g>
>> 
>> ###Subversion Directory
>> 
>> Upon entering incubation: https://github.com/baidu/palo.
>> After incubation, we want to move the existing repo from
>>https://github.com/baidu/palo to Apache infrastructure.
>> 
>> ###Issue Tracking
>> 
>> Palo currently uses GitHub to track issues. Would like to continue to
>>do so while we discuss migration possibilities with the ASF Infra
>>committee.
>> 
>> ###Other Resources
>> 
>> The existing code already has unit tests so we will make use of
>>existing Apache continuous testing infrastructure. The resulting load
>>should not be very large.
>> 
>> ##Initial Committers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ##Affiliations
>> 
>> The initial committers are employees of Baidu Inc.. The nominated
>>mentors are employees of TODO.
>> 
>> ##Sponsors
>> 
>> ###Champion
>> 
>> TODO
>> 
>> ###Nominated Mentors
>> 
>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>> 
>> ###Sponsoring Entity
>> 
>> We are requesting the Incubator to sponsor this project.
>> 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org

Re: Looking for Champion

Posted by "Tan,Zhongyi" <ta...@baidu.com>.

Hi，guys, 

palo is one good project ,

Is there anyone who volunteer to be the champion of it to
help us to go through process to become an apache project?

Thanks

>
>On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote:
>> Hi all,
>> 
>> I am Reed, as a developer worked with the team for Palo (a MPP-based
>>interactive SQL data warehousing).
>> https://github.com/baidu/palo/wiki/Palo-Overview
>> 
>> We propose to contribute Palo as an Apache Incubator project, and
>> we are still looking for possible Champion if anyone would like to
>>volunteer. Thanks a lot.
>> 
>> Best Regards,
>> Reed
>> 
>> ===================
>> The draft of the proposal as below:
>> 
>> #Apache Palo
>> 
>> ##Abstract
>> 
>> Palo is a MPP-based interactive SQL data warehousing for reporting and
>>analysis.
>> 
>> ##Proposal
>> 
>> We propose to contribute the Palo codebase and associated artifacts
>>(e.g. documentation, web-site content etc.) to the Apache Software
>>Foundation with the intent of forming a productive, meritocratic and
>>open community around Palo’s continued development, according to the
>>‘Apache Way’.
>> 
>> Baidu owns several trademarks regarding Palo, and proposes to transfer
>>ownership of those trademarks in full to the ASF.
>> 
>> ###Overview of Palo
>> 
>> Palo’s implementation consists of two daemons: Frontend (FE) and
>>Backend (BE).
>> 
>> **Frontend daemon** consists of query coordinator and catalog manager.
>>Query coordinator is responsible for receiving users’ sql queries,
>>compiling queries and managing queries execution. Catalog manager is
>>responsible for managing metadata such as databases, tables, partitions,
>>replicas and etc. Several frontend daemons could be deployed to
>>guarantee fault-tolerance, and load balancing.
>> 
>> **Backend daemon** stores the data and executes the query fragments.
>>Many backend daemons could also be deployed to provide scalability and
>>fault-tolerance.
>> 
>> A typical Palo cluster generally composes of several frontend daemons
>>and dozens to hundreds of backend daemons.
>> 
>> Users can use MySQL client tools to connect any frontend daemon to
>>submit SQL query. Frontend receives the query and compiles it into query
>>plans executable by the Backend. Then Frontend sends the query plan
>>fragments to Backend. Backend will build a query execution DAG. Data is
>>fetched and pipelined into the DAG. The final result response is sent to
>>client via Frontend. The distribution of query fragment execution takes
>>minimizing data movement and maximizing scan locality as the main goal.
>> 
>> ##Background
>> 
>> At Baidu, Prior to Palo, different tools were deployed to solve diverse
>>requirements in many ways. And when a use case requires the simultaneous
>>availability of capabilities that cannot all be provided by a single
>>tool, users were forced to build hybrid architectures that stitch
>>multiple tools together, but we believe that they shouldn’t need to
>>accept such inherent complexity. A storage system built to provide great
>>performance across a broad range of workloads provides a more elegant
>>solution to the problems that hybrid architectures aim to solve. Palo is
>>the solution.
>> 
>> Palo is designed to be a simple and single tightly coupled system, not
>>depending on other systems. Palo provides high concurrent low latency
>>point query performance, but also provides high throughput queries of
>>ad-hoc analysis. Palo provides bulk-batch data loading, but also
>>provides near real-time mini-batch data loading. Palo also provides high
>>availability, reliability, fault tolerance, and scalability.
>> 
>> ##Rationale
>> 
>> Palo mainly integrates the technology of Google Mesa and Apache Impala.
>> 
>> Mesa is a highly scalable analytic data storage system that stores
>>critical measurement data related to Google's Internet advertising
>>business. Mesa is designed to satisfy complex and challenging set of
>>users’ and systems’ requirements, including near real-time data
>>ingestion and query ability, as well as high availability, reliability,
>>fault tolerance, and scalability for large data and query volumes.
>> 
>> Impala is a modern, open-source MPP SQL engine architected from the
>>ground up for the Hadoop data processing environment. At present, by
>>virtue of its superior performance and rich functionality， Impala has
>>been comparable to many commercial MPP database query engine. Mesa can
>>satisfy the needs of many of our storage requirements, however Mesa
>>itself does not provide a SQL query engine; Impala is a very good MPP
>>SQL query engine, but the lack of a perfect distributed storage engine.
>>So in the end we chose the combination of these two technologies.
>> 
>> Learning from Mesa’s data model, we developed a distributed storage
>>engine. Unlike Mesa, this storage engine does not rely on any
>>distributed file system. Then we deeply integrate this storage engine
>>with Impala query engine. Query compiling, query execution coordination
>>and catalog management of storage engine are integrated to be frontend
>>daemon; query execution and data storage are integrated to be backend
>>daemon. With this integration, we implemented a single, full-featured,
>>high performance state the art of MPP database, as well as maintaining
>>the simplicity.
>> 
>> ##Current Status
>> 
>> Palo has been an open source project on GitHub
>>(https://github.com/baidu/palo).
>> 
>> ###Meritocracy
>> 
>> Palo has been deployed in production at Baidu and is applying more than
>>200 lines of business. It has demonstrated great performance benefits
>>and has proved to be a better way for reporting and analysis based big
>>data. Still We look forward to growing a rich user and developer
>>community.
>> 
>> ###Community
>> 
>> Palo seeks to develop developer and user communities during incubation.
>> 
>> ###Core Developers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ###Alignment
>> 
>> Palo is related to several other Apache projects:
>> 
>> * Palo can also read data stored in Apache Hadoop clusters powered by
>>the HDFS filesystem.
>> * Palo is closely integrated with Impala, which is also being proposed
>>to the Incubator.
>> * Palo uses Apache Thrift as its RPC and serialization framework of
>>choice.
>> 
>> ##Known Risks
>> 
>> ###Orphaned Products
>> 
>> The core developers of Palo team plan to work full time on this
>>project. There is very little risk of Palo getting orphaned since at
>>least one large company (Baidu) is extensively using it in their
>>production. For example, currently there are more than 200 use cases
>>using Palo in production. Furthermore, since Palo was open sourced at
>>the beginning of October 2017, it has received more than 660 stars and
>>been forked nearly 170 times. We plan to extend and diversify this
>>community further through Apache.
>> 
>> ###Inexperience with Open Source
>> 
>> The core developers are all active users and followers of open source.
>>They are already committers and contributors to the Palo Github project.
>>All have been involved with the source code that has been released under
>>an open source license, and several of them also have experience
>>developing code in an open source environment. Though the core set of
>>Developers do not have Apache Open Source experience, there are plans to
>>onboard individuals with Apache open source experience on to the project.
>> 
>> ###Homogenous Developers
>> 
>> The most of core developers are from Baidu, but after Palo was open
>>sourced, Palo received a lot of bug fixes and enhancements from other
>>developers not working at Baidu.
>> 
>> ###Reliance on Salaried Developers
>> 
>> Baidu invested in Palo as the OLAP solution and some of its key
>>engineers are working full time on the project. In addition, since there
>>is a growing Big Data need for scalable OLAP solutions, we look forward
>>to other Apache developers and researchers to contribute to the project.
>>Also key to addressing the risk associated with relying on Salaried
>>developers from a single entity is to increase the diversity of the
>>contributors and actively lobby for Domain experts in the BI space to
>>contribute. Apache Palo intends to do this.
>> 
>> ###An Excessive Fascination with the Apache Brand
>> 
>> Palo is proposing to enter incubation at Apache in order to help
>>efforts to diversify the committer-base, not so much to capitalize on
>>the Apache brand. The Palo project is in production use already inside
>>Baidu, but is not expected to be an Baidu product for external
>>customers. As such, the Palo project is not seeking to use the Apache
>>brand as a marketing tool.
>> 
>> ##Documentation
>> 
>> Information about Palo can be found at https://github.com/baidu/palo.
>>The following links provide more information about Palo in open source:
>> 
>> * Palo wiki site: https://github.com/baidu/palo/wiki
>> * Codebase at Github: https://github.com/baidu/palo
>> * Issue Tracking: https://github.com/baidu/palo/issues
>> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
>> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
>> 
>> ##Initial Source
>> 
>> Palo has been under development since 2017 by a team of engineers at
>>Baidu Inc. It is currently hosted on Github.com under an Apache license
>>at https://github.com/baidu/palo.
>> 
>> ##External Dependencies
>> 
>> Palo has the following external dependencies.
>> 
>> * Google gflags (BSD)
>> * Google glog (BSD)
>> * Apache Thrift (Apache Software License v2.0)
>> * Apache Commons (Apache Software License v2.0)
>> * Boost (Boost Software License)
>> * OpenLdap (OpenLDAP Software License)
>> * rapidjson (Tencent)
>> * Google RE2 (BSD-style)
>> * lz4 (BSD)
>> * snappy (BSD)
>> * cyrus-sasl (CMU License)
>> * Twitter Bootstrap (Apache Software License v2.0)
>> * d3 (BSD)
>> * LLVM (BSD-like)
>> 
>> Build and test dependencies:
>> 
>> * ant (Apache Software License v2.0)
>> * Apache Maven (Apache Software License v2.0)
>> * cmake (BSD)
>> * clang (BSD)
>> * Google gtest (Apache Software License v2.0)
>> 
>> ##Required Resources
>> 
>> ###Mailing List
>> 
>> There are currently no mailing lists. The usual mailing lists are
>>expected to be set up when entering incubation:
>> 
>> 
>>private@palo.incubator.apache.org<mailto:private@palo.incubator.apache.or
>>g>
>> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
>> 
>>commits@palo.incubator.apache.org<mailto:commits@palo.incubator.apache.or
>>g>
>> 
>> ###Subversion Directory
>> 
>> Upon entering incubation: https://github.com/baidu/palo.
>> After incubation, we want to move the existing repo from
>>https://github.com/baidu/palo to Apache infrastructure.
>> 
>> ###Issue Tracking
>> 
>> Palo currently uses GitHub to track issues. Would like to continue to
>>do so while we discuss migration possibilities with the ASF Infra
>>committee.
>> 
>> ###Other Resources
>> 
>> The existing code already has unit tests so we will make use of
>>existing Apache continuous testing infrastructure. The resulting load
>>should not be very large.
>> 
>> ##Initial Committers
>> 
>> * Ruyue Ma (https://github.com/maruyue,
>>maruyue@baidu.com<ma...@baidu.com>)
>> * Chun Zhao (https://github.com/imay,
>>buaa.zhaoc@gmail.com<ma...@gmail.com>)
>> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
>> * De Li（https://github.com/lide-reed,
>>mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
>> * Hao Chen (https://github.com/chenhao7253886,
>>chenhao16@baidu.com<ma...@baidu.com>)
>> * Chaoyong Li (https://github.com/cyongli,
>>lichaoyong@baidu.com<ma...@baidu.com>)
>> * Bin Lin (https://github.com/lingbin,
>>lingbinlb@gmail.com<ma...@gmail.com>)
>> 
>> ##Affiliations
>> 
>> The initial committers are employees of Baidu Inc.. The nominated
>>mentors are employees of TODO.
>> 
>> ##Sponsors
>> 
>> ###Champion
>> 
>> TODO
>> 
>> ###Nominated Mentors
>> 
>> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
>> * Luke Han, lukehan@apache.org<ma...@apache.org>
>> * Zheng Shao, zshao@apache.org<ma...@apache.org>
>> 
>> ###Sponsoring Entity
>> 
>> We are requesting the Incubator to sponsor this project.
>> 
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>

Re: Looking for Champion

Posted by Jim Apple <jb...@apache.org>.

Hello! As a contributor to Impala, I’d be interested in hearing thoughts from the Palo community about integration between Impala and Palo.

For instance, are there any apparent design goals of Impala that the Palo community thinks are fundamentally incompatible with Palo?

Thanks,
Jim

On 2018/06/08 04:45:32, "Li,De(BDG)" <li...@baidu.com> wrote: 
> Hi all,
> 
> I am Reed, as a developer worked with the team for Palo (a MPP-based interactive SQL data warehousing).
> https://github.com/baidu/palo/wiki/Palo-Overview
> 
> We propose to contribute Palo as an Apache Incubator project, and
> we are still looking for possible Champion if anyone would like to volunteer. Thanks a lot.
> 
> Best Regards,
> Reed
> 
> ===================
> The draft of the proposal as below:
> 
> #Apache Palo
> 
> ##Abstract
> 
> Palo is a MPP-based interactive SQL data warehousing for reporting and analysis.
> 
> ##Proposal
> 
> We propose to contribute the Palo codebase and associated artifacts (e.g. documentation, web-site content etc.) to the Apache Software Foundation with the intent of forming a productive, meritocratic and open community around Palo’s continued development, according to the ‘Apache Way’.
> 
> Baidu owns several trademarks regarding Palo, and proposes to transfer ownership of those trademarks in full to the ASF.
> 
> ###Overview of Palo
> 
> Palo’s implementation consists of two daemons: Frontend (FE) and Backend (BE).
> 
> **Frontend daemon** consists of query coordinator and catalog manager. Query coordinator is responsible for receiving users’ sql queries, compiling queries and managing queries execution. Catalog manager is responsible for managing metadata such as databases, tables, partitions, replicas and etc. Several frontend daemons could be deployed to guarantee fault-tolerance, and load balancing.
> 
> **Backend daemon** stores the data and executes the query fragments. Many backend daemons could also be deployed to provide scalability and fault-tolerance.
> 
> A typical Palo cluster generally composes of several frontend daemons and dozens to hundreds of backend daemons.
> 
> Users can use MySQL client tools to connect any frontend daemon to submit SQL query. Frontend receives the query and compiles it into query plans executable by the Backend. Then Frontend sends the query plan fragments to Backend. Backend will build a query execution DAG. Data is fetched and pipelined into the DAG. The final result response is sent to client via Frontend. The distribution of query fragment execution takes minimizing data movement and maximizing scan locality as the main goal.
> 
> ##Background
> 
> At Baidu, Prior to Palo, different tools were deployed to solve diverse requirements in many ways. And when a use case requires the simultaneous availability of capabilities that cannot all be provided by a single tool, users were forced to build hybrid architectures that stitch multiple tools together, but we believe that they shouldn’t need to accept such inherent complexity. A storage system built to provide great performance across a broad range of workloads provides a more elegant solution to the problems that hybrid architectures aim to solve. Palo is the solution.
> 
> Palo is designed to be a simple and single tightly coupled system, not depending on other systems. Palo provides high concurrent low latency point query performance, but also provides high throughput queries of ad-hoc analysis. Palo provides bulk-batch data loading, but also provides near real-time mini-batch data loading. Palo also provides high availability, reliability, fault tolerance, and scalability.
> 
> ##Rationale
> 
> Palo mainly integrates the technology of Google Mesa and Apache Impala.
> 
> Mesa is a highly scalable analytic data storage system that stores critical measurement data related to Google's Internet advertising business. Mesa is designed to satisfy complex and challenging set of users’ and systems’ requirements, including near real-time data ingestion and query ability, as well as high availability, reliability, fault tolerance, and scalability for large data and query volumes.
> 
> Impala is a modern, open-source MPP SQL engine architected from the ground up for the Hadoop data processing environment. At present, by virtue of its superior performance and rich functionality， Impala has been comparable to many commercial MPP database query engine. Mesa can satisfy the needs of many of our storage requirements, however Mesa itself does not provide a SQL query engine; Impala is a very good MPP SQL query engine, but the lack of a perfect distributed storage engine. So in the end we chose the combination of these two technologies.
> 
> Learning from Mesa’s data model, we developed a distributed storage engine. Unlike Mesa, this storage engine does not rely on any distributed file system. Then we deeply integrate this storage engine with Impala query engine. Query compiling, query execution coordination and catalog management of storage engine are integrated to be frontend daemon; query execution and data storage are integrated to be backend daemon. With this integration, we implemented a single, full-featured, high performance state the art of MPP database, as well as maintaining the simplicity.
> 
> ##Current Status
> 
> Palo has been an open source project on GitHub (https://github.com/baidu/palo).
> 
> ###Meritocracy
> 
> Palo has been deployed in production at Baidu and is applying more than 200 lines of business. It has demonstrated great performance benefits and has proved to be a better way for reporting and analysis based big data. Still We look forward to growing a rich user and developer community.
> 
> ###Community
> 
> Palo seeks to develop developer and user communities during incubation.
> 
> ###Core Developers
> 
> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<ma...@baidu.com>)
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com>)
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>)
> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<ma...@baidu.com>)
> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<ma...@gmail.com>)
> 
> ###Alignment
> 
> Palo is related to several other Apache projects:
> 
> * Palo can also read data stored in Apache Hadoop clusters powered by the HDFS filesystem.
> * Palo is closely integrated with Impala, which is also being proposed to the Incubator.
> * Palo uses Apache Thrift as its RPC and serialization framework of choice.
> 
> ##Known Risks
> 
> ###Orphaned Products
> 
> The core developers of Palo team plan to work full time on this project. There is very little risk of Palo getting orphaned since at least one large company (Baidu) is extensively using it in their production. For example, currently there are more than 200 use cases using Palo in production. Furthermore, since Palo was open sourced at the beginning of October 2017, it has received more than 660 stars and been forked nearly 170 times. We plan to extend and diversify this community further through Apache.
> 
> ###Inexperience with Open Source
> 
> The core developers are all active users and followers of open source. They are already committers and contributors to the Palo Github project. All have been involved with the source code that has been released under an open source license, and several of them also have experience developing code in an open source environment. Though the core set of Developers do not have Apache Open Source experience, there are plans to onboard individuals with Apache open source experience on to the project.
> 
> ###Homogenous Developers
> 
> The most of core developers are from Baidu, but after Palo was open sourced, Palo received a lot of bug fixes and enhancements from other developers not working at Baidu.
> 
> ###Reliance on Salaried Developers
> 
> Baidu invested in Palo as the OLAP solution and some of its key engineers are working full time on the project. In addition, since there is a growing Big Data need for scalable OLAP solutions, we look forward to other Apache developers and researchers to contribute to the project. Also key to addressing the risk associated with relying on Salaried developers from a single entity is to increase the diversity of the contributors and actively lobby for Domain experts in the BI space to contribute. Apache Palo intends to do this.
> 
> ###An Excessive Fascination with the Apache Brand
> 
> Palo is proposing to enter incubation at Apache in order to help efforts to diversify the committer-base, not so much to capitalize on the Apache brand. The Palo project is in production use already inside Baidu, but is not expected to be an Baidu product for external customers. As such, the Palo project is not seeking to use the Apache brand as a marketing tool.
> 
> ##Documentation
> 
> Information about Palo can be found at https://github.com/baidu/palo. The following links provide more information about Palo in open source:
> 
> * Palo wiki site: https://github.com/baidu/palo/wiki
> * Codebase at Github: https://github.com/baidu/palo
> * Issue Tracking: https://github.com/baidu/palo/issues
> * Overview: https://github.com/baidu/palo/wiki/Palo-Overview
> * FAQ: https://github.com/baidu/palo/wiki/Palo-FAQ
> 
> ##Initial Source
> 
> Palo has been under development since 2017 by a team of engineers at Baidu Inc. It is currently hosted on Github.com under an Apache license at https://github.com/baidu/palo.
> 
> ##External Dependencies
> 
> Palo has the following external dependencies.
> 
> * Google gflags (BSD)
> * Google glog (BSD)
> * Apache Thrift (Apache Software License v2.0)
> * Apache Commons (Apache Software License v2.0)
> * Boost (Boost Software License)
> * OpenLdap (OpenLDAP Software License)
> * rapidjson (Tencent)
> * Google RE2 (BSD-style)
> * lz4 (BSD)
> * snappy (BSD)
> * cyrus-sasl (CMU License)
> * Twitter Bootstrap (Apache Software License v2.0)
> * d3 (BSD)
> * LLVM (BSD-like)
> 
> Build and test dependencies:
> 
> * ant (Apache Software License v2.0)
> * Apache Maven (Apache Software License v2.0)
> * cmake (BSD)
> * clang (BSD)
> * Google gtest (Apache Software License v2.0)
> 
> ##Required Resources
> 
> ###Mailing List
> 
> There are currently no mailing lists. The usual mailing lists are expected to be set up when entering incubation:
> 
> private@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> dev@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> commits@palo.incubator.apache.org<ma...@palo.incubator.apache.org>
> 
> ###Subversion Directory
> 
> Upon entering incubation: https://github.com/baidu/palo.
> After incubation, we want to move the existing repo from https://github.com/baidu/palo to Apache infrastructure.
> 
> ###Issue Tracking
> 
> Palo currently uses GitHub to track issues. Would like to continue to do so while we discuss migration possibilities with the ASF Infra committee.
> 
> ###Other Resources
> 
> The existing code already has unit tests so we will make use of existing Apache continuous testing infrastructure. The resulting load should not be very large.
> 
> ##Initial Committers
> 
> * Ruyue Ma (https://github.com/maruyue, maruyue@baidu.com<ma...@baidu.com>)
> * Chun Zhao (https://github.com/imay, buaa.zhaoc@gmail.com<ma...@gmail.com>)
> * Mingyu Chen (https://github.com/morningman,chenmingyu@baidu.com)
> * De Li（https://github.com/lide-reed, mailtolide@sina.com）<mailto:mailtolide@sina.com%EF%BC%89>
> * Hao Chen (https://github.com/chenhao7253886, chenhao16@baidu.com<ma...@baidu.com>)
> * Chaoyong Li (https://github.com/cyongli, lichaoyong@baidu.com<ma...@baidu.com>)
> * Bin Lin (https://github.com/lingbin, lingbinlb@gmail.com<ma...@gmail.com>)
> 
> ##Affiliations
> 
> The initial committers are employees of Baidu Inc.. The nominated mentors are employees of TODO.
> 
> ##Sponsors
> 
> ###Champion
> 
> TODO
> 
> ###Nominated Mentors
> 
> * sijie guo, guosijie@gmail.com<ma...@gmail.com>
> * Luke Han, lukehan@apache.org<ma...@apache.org>
> * Zheng Shao, zshao@apache.org<ma...@apache.org>
> 
> ###Sponsoring Entity
> 
> We are requesting the Incubator to sponsor this project.
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
For additional commands, e-mail: general-help@incubator.apache.org