You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by 梁景明 <fu...@gmail.com> on 2010/11/03 09:56:55 UTC

about hbase security

hi , is there any features for me to control the client to access to my
hbase.
like some authority ,some user or some password?

now one way to control
my servers  use iptables to control the access, is there any better way ?
thanks.

RE: HBase as a versioned key/value store

Posted by Jonathan Gray <jg...@facebook.com>.

Ah, reads.  Totally different story.

Are you using the block cache?  How much heap do you have configured for your RSs?

Some of that debug should be displaying the stats of your block cache... Want to paste a few lines of that?

> -----Original Message-----
> From: Wojciech Langiewicz [mailto:wlangiewicz@gmail.com]
> Sent: Wednesday, November 03, 2010 7:15 AM
> To: user@hbase.apache.org
> Subject: Re: HBase as a versioned key/value store
> 
> Hello,
> 
> 2010/11/3 Jonathan Gray <jg...@facebook.com>
> 
> > Hi Wojciech,
> >
> > HBase can easily be used as a versioned key/value store.  I'd say
> that's
> > one of the easiest ways to use it.
> >
> > To help you get more throughput, you'll have to provide more details.
> >
> > What version are you running, what kind of hardware / configuration,
> and
> > what does your client look like that is doing the writes?
> >
> 
> I'm running latest version from Cloudera, and I'm running that script
> directly from one of the servers. All 4 of them are: 2xXeon 5400, 16GB
> RAM,
> 2x1TB HDD.
> 
> 
> > Each KV is a distinct Put operation?  Normally people get high
> throughput
> > by batching many Puts at once.
> >
> 
> Actually, here I'm asking about Get operations, because I don't know
> how to
> batch them (by design). But in case of Puts you are right.
> 
> 
> > During your writes, what do you see in the RegionServer logs?  Are
> things
> > calm or a lot of things happening?  You could also be dealing with
> some cold
> > start issues if you don't have enough regions in your table or if
> your
> > writes are not distributed across the keyspace.
> >
> 
> Writes are equally distributed, I see it with www interface,
> regionserver
> logs are calm, only debug information appears from time to time (about
> every
> minute).
> 
> I'm rather asking what can I expect from my schema design and hardware
> by
> comparing other people solutions, right now I'm getting 10 times less
> performance that I initially wanted.
> 
> 
> >
> > JG
> >
> > > -----Original Message-----
> > > From: Wojciech Langiewicz [mailto:wlangiewicz@gmail.com]
> > > Sent: Wednesday, November 03, 2010 4:19 AM
> > > To: user@hbase.apache.org
> > > Subject: HBase as a versioned key/value store
> > >
> > > Hello,
> > > I would like to know if any of is are using HBase as a versioned
> > > key/value store. What I mean by versioned is keys map to multiple
> > > values
> > > with timestamp.
> > >
> > > So the whole table would have many rows and only one column family
> with
> > > one column.
> > >
> > > I'm trying to work out what performance could I get from this
> design,
> > > because right now I can only get about 1500 - 2000 requests per
> second
> > > on 4 medium servers, and I'd like to get about 10x more. I know
> that
> > > it's not optimal use for HBase, but if only it was 10x faster I
> would't
> > > have to use any other system for this.
> > >
> > > Thanks in advance for any ideas.
> > >
> > > --
> > > Wojciech Langiewicz
> >
> 
> 
> 
> --
> Wojciech Langiewicz

Re: HBase as a versioned key/value store

Posted by Wojciech Langiewicz <wl...@gmail.com>.

Hi,
In one go I do about 5M reads. In this case running task for longer didn't
affect performance.
Regarding DISTINCT: I'm doing this through Pig - I've written UDF for
getting Rows, and right now I started to wonder if this might affect
performance. I have run more test and at any moment CPU usage is not greater
than 50% and disks are barely used.
>From logs I see, that cache hit reaches about 97%.
I have tested latency from hbase shell, and first get takes 0.7120 second,
but next ones take 0.04 second, which is as expected, so I think that Pig
might be slowing everything down.

2010/11/5 Stack <st...@duboce.net>

> On Thu, Nov 4, 2010 at 4:06 AM, Wojciech Langiewicz
> <wl...@gmail.com> wrote:
> > I didn't notice any improvement after changing option
> > hfile.block.cache.size, I don't know if this i relevant, but in my
> testing
> > job I do at most only one Get per row (before querying HBase I do
> DISTINCT).
> >
> > Stats from cache reads are here: http://pastebin.com/BmmL09dK
> > This is after restarting servers, and during running first job.
> >
>
> How many reads did you do?  I see the cache hit ratio climbing as your
> test progresses.  Run it for longer?  What kinda latency are you
> seeing?
>
> Coming out of cache you should be seeing < 5ms or so?
>
> How are you accessing HBase (The DISTINCT above makes me wonder).
>
> Thanks,
> St.Ack
>



-- 
Wojciech Langiewicz

Re: HBase as a versioned key/value store

Posted by Stack <st...@duboce.net>.

On Thu, Nov 4, 2010 at 4:06 AM, Wojciech Langiewicz
<wl...@gmail.com> wrote:
> I didn't notice any improvement after changing option
> hfile.block.cache.size, I don't know if this i relevant, but in my testing
> job I do at most only one Get per row (before querying HBase I do DISTINCT).
>
> Stats from cache reads are here: http://pastebin.com/BmmL09dK
> This is after restarting servers, and during running first job.
>

How many reads did you do?  I see the cache hit ratio climbing as your
test progresses.  Run it for longer?  What kinda latency are you
seeing?

Coming out of cache you should be seeing < 5ms or so?

How are you accessing HBase (The DISTINCT above makes me wonder).

Thanks,
St.Ack

Re: HBase as a versioned key/value store

Posted by Wojciech Langiewicz <wl...@gmail.com>.

Right now I have 4GB of heap per regionserver, and as Stack suggested, I
have set hfile.block.cache.size to 0.5.
At the moment of doing Gets there's nothing more running that would affect
performance. Cells are very small - they contain 1 integer and this table
has about 20M rows, it spans over 4 regionservers, so I have about 64
regions, each is 256MB.

I use RAID, but this will be changed soon, but I takes time (we're moving to
new servers).

I didn't notice any improvement after changing option
hfile.block.cache.size, I don't know if this i relevant, but in my testing
job I do at most only one Get per row (before querying HBase I do DISTINCT).

Stats from cache reads are here: http://pastebin.com/BmmL09dK
This is after restarting servers, and during running first job.

Thanks for helping me.

2010/11/3 Stack <st...@duboce.net>

> On Wed, Nov 3, 2010 at 7:15 AM, Wojciech Langiewicz
> <wl...@gmail.com> wrote:
> >
> > I'm running latest version from Cloudera
>
>
> Try a later version of the 0.89 series.  See the downloads page on our
> site.   It has perf. improvements.
>
>
> >> Each KV is a distinct Put operation?  Normally people get high
> throughput
> >> by batching many Puts at once.
> >>
> >
> > Actually, here I'm asking about Get operations, because I don't know how
> to
> > batch them (by design). But in case of Puts you are right.
> >
>
> There is a batch Get in TRUNK that should be available as 0.90.0RC0 soon.
>
> > I'm rather asking what can I expect from my schema design and hardware by
> > comparing other people solutions, right now I'm getting 10 times less
> > performance that I initially wanted.
> >
> >
> Well, if going to disk, reading we're talking 10-30ms a hit.  If you
> are reading from cache, you should see 5ms and less.  Try upping
> proportion of your heap given over to block cache; set
> hfile.block.cache.size to 0.4 or 0.5 of heap (Writes should be going
> in pretty fast -- ~5m or less).
>
> What size your cells?  How many regions in your table?   How much RAM
> have you given over to HBase?   Anything else running on these
> machines?  You doing any wacky RAID'ing on those disks?
>
> Good luck,
> St.Ack
>

-- 
Wojciech Langiewicz

Re: HBase as a versioned key/value store

Posted by Stack <st...@duboce.net>.

On Wed, Nov 3, 2010 at 7:15 AM, Wojciech Langiewicz
<wl...@gmail.com> wrote:
>
> I'm running latest version from Cloudera


Try a later version of the 0.89 series.  See the downloads page on our
site.   It has perf. improvements.


>> Each KV is a distinct Put operation?  Normally people get high throughput
>> by batching many Puts at once.
>>
>
> Actually, here I'm asking about Get operations, because I don't know how to
> batch them (by design). But in case of Puts you are right.
>

There is a batch Get in TRUNK that should be available as 0.90.0RC0 soon.

> I'm rather asking what can I expect from my schema design and hardware by
> comparing other people solutions, right now I'm getting 10 times less
> performance that I initially wanted.
>
>
Well, if going to disk, reading we're talking 10-30ms a hit.  If you
are reading from cache, you should see 5ms and less.  Try upping
proportion of your heap given over to block cache; set
hfile.block.cache.size to 0.4 or 0.5 of heap (Writes should be going
in pretty fast -- ~5m or less).

What size your cells?  How many regions in your table?   How much RAM
have you given over to HBase?   Anything else running on these
machines?  You doing any wacky RAID'ing on those disks?

Good luck,
St.Ack

Re: HBase as a versioned key/value store

Posted by Wojciech Langiewicz <wl...@gmail.com>.

Hello,

2010/11/3 Jonathan Gray <jg...@facebook.com>

> Hi Wojciech,
>
> HBase can easily be used as a versioned key/value store.  I'd say that's
> one of the easiest ways to use it.
>
> To help you get more throughput, you'll have to provide more details.
>
> What version are you running, what kind of hardware / configuration, and
> what does your client look like that is doing the writes?
>

I'm running latest version from Cloudera, and I'm running that script
directly from one of the servers. All 4 of them are: 2xXeon 5400, 16GB RAM,
2x1TB HDD.


> Each KV is a distinct Put operation?  Normally people get high throughput
> by batching many Puts at once.
>

Actually, here I'm asking about Get operations, because I don't know how to
batch them (by design). But in case of Puts you are right.


> During your writes, what do you see in the RegionServer logs?  Are things
> calm or a lot of things happening?  You could also be dealing with some cold
> start issues if you don't have enough regions in your table or if your
> writes are not distributed across the keyspace.
>

Writes are equally distributed, I see it with www interface, regionserver
logs are calm, only debug information appears from time to time (about every
minute).

I'm rather asking what can I expect from my schema design and hardware by
comparing other people solutions, right now I'm getting 10 times less
performance that I initially wanted.


>
> JG
>
> > -----Original Message-----
> > From: Wojciech Langiewicz [mailto:wlangiewicz@gmail.com]
> > Sent: Wednesday, November 03, 2010 4:19 AM
> > To: user@hbase.apache.org
> > Subject: HBase as a versioned key/value store
> >
> > Hello,
> > I would like to know if any of is are using HBase as a versioned
> > key/value store. What I mean by versioned is keys map to multiple
> > values
> > with timestamp.
> >
> > So the whole table would have many rows and only one column family with
> > one column.
> >
> > I'm trying to work out what performance could I get from this design,
> > because right now I can only get about 1500 - 2000 requests per second
> > on 4 medium servers, and I'd like to get about 10x more. I know that
> > it's not optimal use for HBase, but if only it was 10x faster I would't
> > have to use any other system for this.
> >
> > Thanks in advance for any ideas.
> >
> > --
> > Wojciech Langiewicz
>



-- 
Wojciech Langiewicz

RE: HBase as a versioned key/value store

Posted by Jonathan Gray <jg...@facebook.com>.

Hi Wojciech,

HBase can easily be used as a versioned key/value store.  I'd say that's one of the easiest ways to use it.

To help you get more throughput, you'll have to provide more details.

What version are you running, what kind of hardware / configuration, and what does your client look like that is doing the writes?

Each KV is a distinct Put operation?  Normally people get high throughput by batching many Puts at once.

During your writes, what do you see in the RegionServer logs?  Are things calm or a lot of things happening?  You could also be dealing with some cold start issues if you don't have enough regions in your table or if your writes are not distributed across the keyspace.

JG

> -----Original Message-----
> From: Wojciech Langiewicz [mailto:wlangiewicz@gmail.com]
> Sent: Wednesday, November 03, 2010 4:19 AM
> To: user@hbase.apache.org
> Subject: HBase as a versioned key/value store
> 
> Hello,
> I would like to know if any of is are using HBase as a versioned
> key/value store. What I mean by versioned is keys map to multiple
> values
> with timestamp.
> 
> So the whole table would have many rows and only one column family with
> one column.
> 
> I'm trying to work out what performance could I get from this design,
> because right now I can only get about 1500 - 2000 requests per second
> on 4 medium servers, and I'd like to get about 10x more. I know that
> it's not optimal use for HBase, but if only it was 10x faster I would't
> have to use any other system for this.
> 
> Thanks in advance for any ideas.
> 
> --
> Wojciech Langiewicz

HBase as a versioned key/value store

Posted by Wojciech Langiewicz <wl...@gmail.com>.

Hello,
I would like to know if any of is are using HBase as a versioned 
key/value store. What I mean by versioned is keys map to multiple values 
with timestamp.

So the whole table would have many rows and only one column family with 
one column.

I'm trying to work out what performance could I get from this design, 
because right now I can only get about 1500 - 2000 requests per second 
on 4 medium servers, and I'd like to get about 10x more. I know that 
it's not optimal use for HBase, but if only it was 10x faster I would't 
have to use any other system for this.

Thanks in advance for any ideas.

--
Wojciech Langiewicz

Re: about hbase security

Posted by Gary Helmling <gh...@gmail.com>.

HBase access control features are in active development at the moment.
Currently we're building on top of secure Hadoop and using Kerberos for
client authentication, with HBase providing additional tools for managing
access to individual tables or column families.  See the following issues in
JIRA:

https://issues.apache.org/jira/browse/HBASE-1697
https://issues.apache.org/jira/browse/HBASE-3025

The current code is functional for basic read/write permissions, but is
definitely a work in progress.  If you'd like to track the progress, you can
find it at: https://github.com/trendmicro/hbase/tree/security

But the code base is in active development, changing frequently, and depends
upon some features that are not yet present in HBase trunk.

If you have specific needs for security, please comment on the JIRAs or
start checking out the code.  Additional thoughts or contributions are
definitely welcome!

Regardless of any security layer within HBase, however, you'll still want to
firewall off your cluster and limit access to only your clients at the
iptables/netfilter level.  Having a publicly exposed Hadoop/HBase cluster is
not a good idea.

Gary

On Wed, Nov 3, 2010 at 6:30 AM, Sean Bigdatafun
<se...@gmail.com>wrote:

> CDH3beta3 seems to provide what you want (ACL)
>
>
> On Wed, Nov 3, 2010 at 1:56 AM, 梁景明 <fu...@gmail.com> wrote:
>
> > hi , is there any features for me to control the client to access to my
> > hbase.
> > like some authority ,some user or some password?
> >
> > now one way to control
> > my servers  use iptables to control the access, is there any better way ?
> > thanks.
> >
>
>
>
> --
> --Sean
>

Re: about hbase security

Posted by Sean Bigdatafun <se...@gmail.com>.

CDH3beta3 seems to provide what you want (ACL)


On Wed, Nov 3, 2010 at 1:56 AM, 梁景明 <fu...@gmail.com> wrote:

> hi , is there any features for me to control the client to access to my
> hbase.
> like some authority ,some user or some password?
>
> now one way to control
> my servers  use iptables to control the access, is there any better way ?
> thanks.
>



-- 
--Sean