You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Devaraj Das <dd...@hortonworks.com> on 2016/05/02 19:52:12 UTC
Re: What's going on? Two C++ clients being developed

(Meant to send this earlier but got delayed)
Thanks Clay for the inputs. Would like to give a quick update on where we are at this point and solicit thoughts on how to proceed from here:

1. The patch (from Vamsi) that has been uploaded on RB last has the copyright and other license related stuff taken care of.

2. We are taking a look at making the configuration pluggable, and provide an implementation that works with XML files. Maybe, something like this - if the configured conf directory has XML files, assume they are the default config files. If not, use the config loader meant for that particular type. The filename extension can be used to make this choice I guess..

3. On the sync/async RPC implementation, we are continuing to investigate this. This is something we could work together on with Elliott actively. On a related note, we have an implementation of the "batch" calls that's doing the get/put sequentially for each of the get/put. The HBase Java-client's AsyncProcess does it so that multiple regionservers are reached out to in parallel, etc. Looking at if implementing the RPC async from the get go would obviate the need for AsyncProcess in c++ client...

4. We can look at making smaller patches for the various client API and associated classes like GET, PUT, TableName, etc. (this is called out in the last mail from Enis). The way I see it - there is the front end work for providing classes for the APIs, and there is the back end work - Connection management, RPC, AsyncProcess-like-stuff.. There is a good amount of work done in Vamsi's patch for the former, and there is an Async RPC basis for the backend in Elliott's branch. We should see how we can leverage both and come up with one unified implementation if possible.

Thoughts?

________________________________________
From: Clay Baenziger (BLOOMBERG/ 731 LEX) <cb...@bloomberg.net>
Sent: Monday, April 25, 2016 11:01 AM
To: dev@hbase.apache.org
Subject: Re: What's going on? Two C++ clients being developed

From an operator's view-point, I would add:

I am concerned that as an operator who often has to build Hadoop eco-system components and is chiefly interested in these C++ bindings that a non Apache, GNU or otherwise large-scale community supported open source utility in the build chain is liability to this codebase and its adoption.

As to the configuration process, I would really like to keep with XML. I am looking to use Maven repositories to host the configurations of our clusters (e.g. a POM-file per cluster hosting hbase-site.xml, hdfs-site.xml, etc.); it would be a pain to have to synchronize two configurations of the same information both on the publishing side and client side dependent on the use. It would be possible to duplicate all this information just because of different consumers but XML should not be terribly difficult for C/C++ code to parse -- e.g. OpenSolaris's use in SMF, Zones, etc. Further, for an example of an incubator project which uses XML configs already, see Apache HAWQ's use of hdfs-client.xml and similar for YARN with their pure non-Java implementation for HDFS and YARN client: https://github.com/apache/incubator-hawq/blob/9452055bc74e64f308a8b6cc2b7ab946e5584ba8/src/backend/utils/misc/etc/hdfs-client.xml.

I certainly would not be opposed to a pluggable configuration system. I'd imagine Apache Ambari could use that to not need to materialize XML configs from Postgres; I could see using Zookeeper akin to how Apache Solr Cloud uses Zookeeper for configuration information. But at this time, we have XML files for better or worse and a pluggable configuration system sounds like a great separate JIRA.

-Clay


From: dev@hbase.apache.org At: Apr 19 2016 15:31:25
To: dev@hbase.apache.org
Subject: Fwd:Re: What's going on? Two C++ clients being developed at the moment?

So there are a couple of technical topics that we can further discuss and
hopefully come to a conclusion for going forward.

1. Build system. I am in the auto-tools camp, unless there is a very good
reason to use a non-standard tool like Buck / Bazel, etc. Not sure whether
it makes sense to have two different build systems concurrently. Can we do
the main build with make, and create a wrapper with Buck?

2. XML based configuration versus something native. I strongly believe that
we should support standard hbase-site.xml. A lot of tooling in the Hadoop
ecosystem has already been developed for managing and deploying XML based
configurations over the years. Puppet / Chef scripts, Ambari, CM, etc all
understand and support hbase-site.xml. This is also true for hadoop
operators who should be familiar with modifying these files. So it would be
a real pain if we suddenly come up with yet another config format that
requires the operators and tools to learn how to deploy and manage. What if
there are both java clients and C++ clients in the same nodes. How do you
keep two config files in sync? Then there is the issue of
hbase-default.xml. It should be sourced by the config system of C++ client
otherwise how can we default the values? We cannot keep a different version
of the defaults file for the native as well since they will go out of sync
really soon.

Having said that though, it does not mean we should not support other
config formats. We can make the configuration pluggable, so that if needed
other implementations based on properties, etc can be developed.

3. Licenses / Copyright. I think there is already agreement that GPL is a
no-no, and all copyright have to be fixed.

4. Standard Library to use POCO / Folly / Boost / Something else. I would
want us to standardize on something that everybody else uses, like guava.
My understanding is that everybody uses boost, so it is safe to use. I am
not particularly familiar with Folly, but if it has a larger traction that
POCO, if should be better to use.

5. Sync client versus async client. First, we have to differentiate between
async client versus async RPC client. In Java, we have both async RPC
client and sync RPC client. We only have Sync client using async RPC
client.

I did not check the patches in HBASE-14850. Does that have a working async
RPC client yet? If so, can we start with sync client implementation using
the async RPC client? Since we already have the implementation, it will
give us something that can be released and used as an experimental feature.
Then in parallel, we can work on the async client and have it in the same
code base together with the sync client to give users a choice. Then we can
do a further work on trying to get the sync version re-build on top of the
async one once we have confidence with the async client. How does that plan
sound?

6. Interfaces not related to async / sync client or build or standard
library. There is bunch of code in the patches for HBASE-15534 that is not
related to async or sync (get, put, etc). Can we extract those out into a
different patch, and get them committed so that both efforts can use the
same base?

Enis


On Tue, Apr 19, 2016 at 11:20 AM, Andrew Purtell <ap...@apache.org>
wrote:

> My understanding at this point is someone wants to contribute a C++ client
> which:
> - Is a significant amount of code
> - Is a significant amount of code developed by individuals without an ICLA
> on file at the Foundation
> - Is or was GPL 3 licensed (rights holder can relicense as ASL 2.0, no
> problem)
> - May have copyleft dependencies or generate files with copyleft license
> headers (this would be a showstopper, these have to go)
> - Includes copy-paste code with third party licenses (which might be ok, as
> long as copyright headers are preserved and licensing is compatible)
>
> I would only be comfortable taking this on via the Incubator's IP Clearance
> process (http://incubator.apache.org/ip-clearance/). This should not be
> considered as a roadblock - certainly I don't mean it as such - but instead
> acknowledgement we are dealing with a code grant of uncertain IP
> provenance, so all concerned should be aware of the necessary process for
> getting it in should we want to move forward.
>
>
> On Tue, Apr 19, 2016 at 11:05 AM, Dima Spivak <ds...@cloudera.com>
> wrote:
>
> > Just to be clear, Apache 2 licensed code CAN be included in GPL 3
> projects,
> > but GPL 3 licensed code CANNOT be included in Apache 2 projects (one-way
> > only). http://www.apache.org/licenses/GPL-compatibility.html provides
> the
> > complete story, I just raised my point early because I’ve personally
> > witnessed the pain that results from people assuming that one FOSS
> license
> > is just like any other.
> >
> > More broadly, I’m assuming I’m not in the minority when I say that until
> > this thread, I had no clue what was going on with these efforts. Easy
> > access to a design doc in a JIRA (if one exists) should always come
> before
> > an 11-page ReviewBoard drop, in my humble opinion.
> >
> > -Dima
> >
> > On Tue, Apr 19, 2016 at 2:26 AM, Priyadharshini Karthikeyan <
> > priya.darshini@hashmapinc.com> wrote:
> >
> > > While generating the configure shell script from configure.ac file,
> > > autoconf by default installs ./install.sh and ./missing. The
> > > ownership/copyright that you are mentioning has come from those default
> > > installs and We have not copied any outside code intentionally. I agree
> > > these dependencies are not suppose to be checked in to the repo.
> > >
> > > Since Apache License version 2.0 is compatible with version 3.0 of the
> > GPL
> > > (GNU Public License), We used GPL for building our hbase C++ client. If
> > it
> > > is not supposed to be used, we will not use it. Thanks for pointing out
> > and
> > > I will address this as high priority.
> > >
> > >
> > >
> > > On 4/19/16, 2:50 AM, "Elliott Clark" <ec...@apache.org> wrote:
> > >
> > > >On Mon, Apr 18, 2016 at 10:59 PM, <va...@hashmapinc.com> wrote:
> > > >
> > > >> Whenever we added new source files, the default template injected
> our
> > > >> names into those files.
> > > >>
> > > >
> > > >There are copyrights from:
> > > >
> > > >Copyright (C) 1994 X Consortium
> > > >Copyright (C) 1996-2013 Free Software Foundation, Inc.
> > > >Originally written by Fran,cois Pinard <pi...@iro.umontreal.ca>,
> 1996.
> > > >
> > > >None of those are you. Neither of those are auto generated from
> > eclipse's
> > > >templates.
> > >
> > >
> >
>
>
>
> --
> Best regards,
>
>    - Andy
>
> Problems worthy of attack prove their worth by hitting back. - Piet Hein
> (via Tom White)
>