You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hbase.apache.org by Lars Francke <la...@gmail.com> on 2010/04/28 12:14:49 UTC

Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

I'd like to get a vote/discussion started about
https://issues.apache.org/jira/browse/HBASE-2170

I've seen multiple requests on Twitter, IRC or the mailing lists for a
simple way to start client development. This simply does not exist at
the moment and I suspect the problem will grow worse the more people
use HBase.

All the problems and possible solutions seem to be laid out in the
ticket. Problem: I just want to get all the required jars to write a
program that accesses HBase, preferably those jars are distributed via
a publicly accessible repository. At the moment this is just not
easily doable and it is very hard to get started as Kay Kay has laid
out beautifully (just add jars until there are no more
ClassNotFoundExceptions). It should be easy to still provide a jar
that contains everything as it is now if needed.

As this would take another considerable effort in moving around a
whole bunch of files and directories I'd like to get a consensus
before one of us (seems as if Paul Smith, Kay Kay and me would be
candidates) does unnecessary work on this.

I'd love your input on this (here or in the ticket) and if you have
any questions about the Maven side of this shoot.

Cheers,
Lars

Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Al Lias <al...@gmx.de>.
Am 17.05.2010 20:28, Stack wrote:
> ...
>>
> 
> So, if no modules, could we still produce an hbase-client via
> something like an assembly or, given that it would need a supporting
> pom, we could only do it if we keep up modules?
> 
I think yes, otherwise you (basically) confuse the one-project->one
artifact logic of mvn. But the "module" is just one dir + the pom (which
is the artifact), perhaps you can even get away without a dir. But if
thats well maintained, it gives a much nicer (i.e. simpler) reference
for other projects building with mvn.

Dont forget about many deep dependencies that occationally lead to
conflicts on a nasty nested level (log4j is a nice one...).

>> If you then look further for a one-piece-jar (for the poor without
>> maven), the shade plugin would do the job.
>>
> 
> This seems easy to do in mvn.  The resultant jar though would be enormous.
> 

Yes; around 23Mb today. And when you think about putting it into some
repo, one probably has to be clear about the licenses insight too.

A jar based on a cleaned pom as mentioned above does make it to around 5
Mb (based on hbase0.20.3 excersize).

But the one-jar thing is a solution for reasonable simple projects only:
the classes inside could conflict with classes of other jars in your
classpath, thus even increasing the confusion.

So a well maintained pom for a artifact "hbase-client" would be great...

Al


> ...

Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Stack <st...@duboce.net>.
On Fri, May 14, 2010 at 3:00 PM, Al Lias <al...@gmx.de> wrote:
> oh I was not in the trunk and haven't seen the modules...
>

Yeah, trunk has modules but I'm thinking we should get rid of them.

My reasoning is that we've just undone all contribs.  They've all been
moved out to github or folded back up into core.  Now we are mavenized
its easy (enough) for dependent projects pulling in the latest hbase
build.   If no contribs, there is no need for an involved mvn build we
currently have with submodules of core, contrib, etc., so we should be
able to move to a cleaner, flat mvn model where there are no
submodules and the mvn product is a single hbase.jar.


> The best would certainly be a sub dir + a module "hbase-client".
>
> If that is too much work with moving files around, you could start a
> hbase-client module without any files in it, just a dumb pom.xml that
> excludes everything not needed (see the attached example; projects that
> include that dependency will not go for the org.eclipse.jdt dependency
> for instance).
>

So, if no modules, could we still produce an hbase-client via
something like an assembly or, given that it would need a supporting
pom, we could only do it if we keep up modules?

> If you then look further for a one-piece-jar (for the poor without
> maven), the shade plugin would do the job.
>

This seems easy to do in mvn.  The resultant jar though would be enormous.

Thanks,
St.Ack

> Regards
>
>        Al
>
> Am 14.05.2010 18:59, schrieb Lars Francke:
>> Sorry. I accidentally hit "send" too soon. Damn phone keyboards.
>>
>> HBase already uses sub modules. But to split this up even further we'd have
>> to refactor the code which won't be easy.
>>
>> So what we thought of doing is provide a second .pom for the same
>> code/artifact but with a stripped down set of dependencies. There are
>> multiple ways of doing this. Do you have any experience with this use-case
>> and provide insight into a good solution? That'd be great as I've never done
>> _this_ in particular.
>>
>> Any help is appreciated.
>>
>> Cheers,
>> Lars
>>
>>
>>>
>>> On May 14, 2010 6:10 PM, "Al Lias" <al...@gmx.de> wrote:
>>>
>>> ...Maven can have submodules, ea...
>>
>>>
>>>> I'm sorry I must have missed your answer somehow.
>>>>
>>>> The problem with using 1 jar "everywhere" ...
>>
>
>

Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Al Lias <al...@gmx.de>.
oh I was not in the trunk and haven't seen the modules...

The best would certainly be a sub dir + a module "hbase-client".

If that is too much work with moving files around, you could start a
hbase-client module without any files in it, just a dumb pom.xml that
excludes everything not needed (see the attached example; projects that
include that dependency will not go for the org.eclipse.jdt dependency
for instance).

But in any case you'll have to maintain a exclusion list that will be
quite big; and many libs have to be excluded from hbase:core AND from
hadoop:core

One can also argue, why these dependencies got in initially...still I
also think it makes sense (We actually also use a stripped down jar set,
reducing client sizes by 15 Mb)

If you then look further for a one-piece-jar (for the poor without
maven), the shade plugin would do the job.

Regards

	Al

Am 14.05.2010 18:59, schrieb Lars Francke:
> Sorry. I accidentally hit "send" too soon. Damn phone keyboards.
> 
> HBase already uses sub modules. But to split this up even further we'd have
> to refactor the code which won't be easy.
> 
> So what we thought of doing is provide a second .pom for the same
> code/artifact but with a stripped down set of dependencies. There are
> multiple ways of doing this. Do you have any experience with this use-case
> and provide insight into a good solution? That'd be great as I've never done
> _this_ in particular.
> 
> Any help is appreciated.
> 
> Cheers,
> Lars
> 
> 
>>
>> On May 14, 2010 6:10 PM, "Al Lias" <al...@gmx.de> wrote:
>>
>> ...Maven can have submodules, ea...
> 
>>
>>> I'm sorry I must have missed your answer somehow.
>>>
>>> The problem with using 1 jar "everywhere" ...
> 


Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Lars Francke <la...@gmail.com>.
Sorry. I accidentally hit "send" too soon. Damn phone keyboards.

HBase already uses sub modules. But to split this up even further we'd have
to refactor the code which won't be easy.

So what we thought of doing is provide a second .pom for the same
code/artifact but with a stripped down set of dependencies. There are
multiple ways of doing this. Do you have any experience with this use-case
and provide insight into a good solution? That'd be great as I've never done
_this_ in particular.

Any help is appreciated.

Cheers,
Lars


>
> On May 14, 2010 6:10 PM, "Al Lias" <al...@gmx.de> wrote:
>
> ...Maven can have submodules, ea...

>
> > I'm sorry I must have missed your answer somehow.
> >
>> The problem with using 1 jar "everywhere" ...

Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Lars Francke <la...@gmail.com>.
Mr. Lias,

HBase currently uses aub

On May 14, 2010 6:10 PM, "Al Lias" <al...@gmx.de> wrote:

...Maven can have submodules, each having its own artifact and thus a
own pom. Its a simple setup. The client code seems to me a perfect
module....

Regards,

       Al


Am 14.05.2010 10:54, schrieb Lars Francke:

> I'm sorry I must have missed your answer somehow.
>
>> The problem with using 1 jar "everywhere" ...

Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Al Lias <al...@gmx.de>.
...Maven can have submodules, each having its own artifact and thus a
own pom. Its a simple setup. The client code seems to me a perfect
module....

Regards,

	Al


Am 14.05.2010 10:54, schrieb Lars Francke:
> I'm sorry I must have missed your answer somehow.
> 
>> The problem with using 1 jar "everywhere" is that the client
>> dependencies are more than the server ones.  So to use the client you
>> only need zookeeper and hadoop, but the server also needs JSP, JspC,
>> etc, etc, things we'd rather not have our clients have to pull into
>> their classpath.
> 
> Exactly.
> 
>> Maybe we can publish a POM with a stripped down set of deps?  Would
>> that be a reasonable half solution? :-)
> 
> That would be possible too, yes. As you aptly named it: It's a
> reasonable half solution :) It'd be great if folks could just add a
> dependency on org.apache.hbase:hbase-client and be done with it. I
> think users won't care that much about how it is done but I think it
> would be a step in the right direction and make adoption of HBase even
> easier.
> 
> The separation of code would make this distinction even clearer and
> the dependencies easier to maintain but I'm good with any (agreed
> upon) solution for this problem.
> 
> Cheers,
> Lars


Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Lars Francke <la...@gmail.com>.
I'm sorry I must have missed your answer somehow.

> The problem with using 1 jar "everywhere" is that the client
> dependencies are more than the server ones.  So to use the client you
> only need zookeeper and hadoop, but the server also needs JSP, JspC,
> etc, etc, things we'd rather not have our clients have to pull into
> their classpath.

Exactly.

> Maybe we can publish a POM with a stripped down set of deps?  Would
> that be a reasonable half solution? :-)

That would be possible too, yes. As you aptly named it: It's a
reasonable half solution :) It'd be great if folks could just add a
dependency on org.apache.hbase:hbase-client and be done with it. I
think users won't care that much about how it is done but I think it
would be a step in the right direction and make adoption of HBase even
easier.

The separation of code would make this distinction even clearer and
the dependencies easier to maintain but I'm good with any (agreed
upon) solution for this problem.

Cheers,
Lars

Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Ryan Rawson <ry...@gmail.com>.
The problem with using 1 jar "everywhere" is that the client
dependencies are more than the server ones.  So to use the client you
only need zookeeper and hadoop, but the server also needs JSP, JspC,
etc, etc, things we'd rather not have our clients have to pull into
their classpath.

Maybe we can publish a POM with a stripped down set of deps?  Would
that be a reasonable half solution? :-)

-ryan

On Thu, Apr 29, 2010 at 2:43 PM, Jonathan Gray <jg...@facebook.com> wrote:
> I just see the separation as added complexity.  Give me one jar and let me use it everywhere... why do I care about the lines being drawn?
>
> It doesn't seem to be the case that we're after anything "lightweight" at all.  It's more about simplicity and ease of use.
>
>
> Also, we're talking about stuffing dependent jars into the client jar (and server jar?).  So if we want to upgrade zookeeper, for example, we regenerate new client/server hbase jars?  How will this operate if we also have a zookeeper jar in the classpath outside the hbase jar?
>
>
> I think clear documentation with the way things currently work is a good way to go.  If there's an additional way for people to execute a command and get a single, client jar which is self-contained, then that could be helpful to them.  Perhaps we could ship with this jar as well?
>
>
> Again, don't really feel that strongly either way.  I’m already uncomfortable enough with maven in general and what it means for how we ship our releases (without dependencies), so we do need to pick a direction and try to make it as easy/clear/documented as possible.
>
> Thanks for continuing the conversation... other people's thoughts?
>
> JG
>
>> -----Original Message-----
>> From: Lars Francke [mailto:lars.francke@gmail.com]
>> Sent: Thursday, April 29, 2010 11:30 AM
>> To: hbase-dev@hadoop.apache.org
>> Subject: Re: Decision/Discussion about HBASE-2170: Lightweight
>> client/Refactoring of source tree
>>
>> > I don't have a strong opinion on this, but find that the core of this
>> problem seems to be a lack of simple documentation about the required
>> jars to run a client.  The practice of adding jars until no longer
>> getting CNFE would be solved with documentation stating which are
>> client-dependent jars.
>>
>> This would be the simplest solution, yes. I'm not sure if it is even
>> mentioned in the ticket ;-)
>>
>> > What exactly is the goal?  Is it to prevent this CNFE trial-and-error
>> practice?  Is it to make it so clients only need a single jar?  Or is
>> it to make a single, lightweight jar that only works for the client?
>>
>> The goal would be to make developing applications that use HBase
>> easier. An extended goal would be to make it easier for those using
>> Maven. At the moment you can depend on the SNAPSHOT libraries that we
>> publish now to the repositories. But those have a complete set of
>> dependencies even for all the server stuff that is not needed on the
>> server side.
>>
>> > Is there a lot of added value by having a single, client-only jar vs
>> a single jar that works for client and server?
>>
>> I can only talk from my personal experience (and the improved
>> documentation you mentioned would also have sufficed to a point) but
>> this separation would have made my start in HBase a lot easier because
>> there is no clear separation between the client and server and the
>> documentation is lacking.
>>
>> I also regularly read this question on IRC, the user mailing list or
>> Twitter. So I'm definitely not alone.
>>
>> > I'm all for making it easier for users, so if users say this would be
>> helpful then we should do something.  Code separation is also not a bad
>> thing.  I just never liked the hadoop separation so I don't want to
>> make things actually more complex in the process.
>>
>> What don't you like about the separation?
>> As mentioned in the ticket something like: hbase-common, hbase-server
>> and hbase-client would lend itself to what we've planned.
>>
>> All in all I don't really have a strong opinion either way - it'd be
>> just nice to get a decision on this.
>>
>> Cheers,
>> Lars
>

RE: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Jonathan Gray <jg...@facebook.com>.
I just see the separation as added complexity.  Give me one jar and let me use it everywhere... why do I care about the lines being drawn?

It doesn't seem to be the case that we're after anything "lightweight" at all.  It's more about simplicity and ease of use.


Also, we're talking about stuffing dependent jars into the client jar (and server jar?).  So if we want to upgrade zookeeper, for example, we regenerate new client/server hbase jars?  How will this operate if we also have a zookeeper jar in the classpath outside the hbase jar?


I think clear documentation with the way things currently work is a good way to go.  If there's an additional way for people to execute a command and get a single, client jar which is self-contained, then that could be helpful to them.  Perhaps we could ship with this jar as well?


Again, don't really feel that strongly either way.  I’m already uncomfortable enough with maven in general and what it means for how we ship our releases (without dependencies), so we do need to pick a direction and try to make it as easy/clear/documented as possible.

Thanks for continuing the conversation... other people's thoughts?

JG

> -----Original Message-----
> From: Lars Francke [mailto:lars.francke@gmail.com]
> Sent: Thursday, April 29, 2010 11:30 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Re: Decision/Discussion about HBASE-2170: Lightweight
> client/Refactoring of source tree
> 
> > I don't have a strong opinion on this, but find that the core of this
> problem seems to be a lack of simple documentation about the required
> jars to run a client.  The practice of adding jars until no longer
> getting CNFE would be solved with documentation stating which are
> client-dependent jars.
> 
> This would be the simplest solution, yes. I'm not sure if it is even
> mentioned in the ticket ;-)
> 
> > What exactly is the goal?  Is it to prevent this CNFE trial-and-error
> practice?  Is it to make it so clients only need a single jar?  Or is
> it to make a single, lightweight jar that only works for the client?
> 
> The goal would be to make developing applications that use HBase
> easier. An extended goal would be to make it easier for those using
> Maven. At the moment you can depend on the SNAPSHOT libraries that we
> publish now to the repositories. But those have a complete set of
> dependencies even for all the server stuff that is not needed on the
> server side.
> 
> > Is there a lot of added value by having a single, client-only jar vs
> a single jar that works for client and server?
> 
> I can only talk from my personal experience (and the improved
> documentation you mentioned would also have sufficed to a point) but
> this separation would have made my start in HBase a lot easier because
> there is no clear separation between the client and server and the
> documentation is lacking.
> 
> I also regularly read this question on IRC, the user mailing list or
> Twitter. So I'm definitely not alone.
> 
> > I'm all for making it easier for users, so if users say this would be
> helpful then we should do something.  Code separation is also not a bad
> thing.  I just never liked the hadoop separation so I don't want to
> make things actually more complex in the process.
> 
> What don't you like about the separation?
> As mentioned in the ticket something like: hbase-common, hbase-server
> and hbase-client would lend itself to what we've planned.
> 
> All in all I don't really have a strong opinion either way - it'd be
> just nice to get a decision on this.
> 
> Cheers,
> Lars

Re: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Lars Francke <la...@gmail.com>.
> I don't have a strong opinion on this, but find that the core of this problem seems to be a lack of simple documentation about the required jars to run a client.  The practice of adding jars until no longer getting CNFE would be solved with documentation stating which are client-dependent jars.

This would be the simplest solution, yes. I'm not sure if it is even
mentioned in the ticket ;-)

> What exactly is the goal?  Is it to prevent this CNFE trial-and-error practice?  Is it to make it so clients only need a single jar?  Or is it to make a single, lightweight jar that only works for the client?

The goal would be to make developing applications that use HBase
easier. An extended goal would be to make it easier for those using
Maven. At the moment you can depend on the SNAPSHOT libraries that we
publish now to the repositories. But those have a complete set of
dependencies even for all the server stuff that is not needed on the
server side.

> Is there a lot of added value by having a single, client-only jar vs a single jar that works for client and server?

I can only talk from my personal experience (and the improved
documentation you mentioned would also have sufficed to a point) but
this separation would have made my start in HBase a lot easier because
there is no clear separation between the client and server and the
documentation is lacking.

I also regularly read this question on IRC, the user mailing list or
Twitter. So I'm definitely not alone.

> I'm all for making it easier for users, so if users say this would be helpful then we should do something.  Code separation is also not a bad thing.  I just never liked the hadoop separation so I don't want to make things actually more complex in the process.

What don't you like about the separation?
As mentioned in the ticket something like: hbase-common, hbase-server
and hbase-client would lend itself to what we've planned.

All in all I don't really have a strong opinion either way - it'd be
just nice to get a decision on this.

Cheers,
Lars

RE: Decision/Discussion about HBASE-2170: Lightweight client/Refactoring of source tree

Posted by Jonathan Gray <jg...@facebook.com>.
I don't have a strong opinion on this, but find that the core of this problem seems to be a lack of simple documentation about the required jars to run a client.  The practice of adding jars until no longer getting CNFE would be solved with documentation stating which are client-dependent jars.

What exactly is the goal?  Is it to prevent this CNFE trial-and-error practice?  Is it to make it so clients only need a single jar?  Or is it to make a single, lightweight jar that only works for the client?

Is there a lot of added value by having a single, client-only jar vs a single jar that works for client and server?

I'm all for making it easier for users, so if users say this would be helpful then we should do something.  Code separation is also not a bad thing.  I just never liked the hadoop separation so I don't want to make things actually more complex in the process.

JG


> -----Original Message-----
> From: Lars Francke [mailto:lars.francke@gmail.com]
> Sent: Wednesday, April 28, 2010 3:15 AM
> To: hbase-dev@hadoop.apache.org
> Subject: Decision/Discussion about HBASE-2170: Lightweight
> client/Refactoring of source tree
> 
> I'd like to get a vote/discussion started about
> https://issues.apache.org/jira/browse/HBASE-2170
> 
> I've seen multiple requests on Twitter, IRC or the mailing lists for a
> simple way to start client development. This simply does not exist at
> the moment and I suspect the problem will grow worse the more people
> use HBase.
> 
> All the problems and possible solutions seem to be laid out in the
> ticket. Problem: I just want to get all the required jars to write a
> program that accesses HBase, preferably those jars are distributed via
> a publicly accessible repository. At the moment this is just not
> easily doable and it is very hard to get started as Kay Kay has laid
> out beautifully (just add jars until there are no more
> ClassNotFoundExceptions). It should be easy to still provide a jar
> that contains everything as it is now if needed.
> 
> As this would take another considerable effort in moving around a
> whole bunch of files and directories I'd like to get a consensus
> before one of us (seems as if Paul Smith, Kay Kay and me would be
> candidates) does unnecessary work on this.
> 
> I'd love your input on this (here or in the ticket) and if you have
> any questions about the Maven side of this shoot.
> 
> Cheers,
> Lars