You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Sujee Maniyam <su...@sujee.net> on 2010/06/10 23:26:28 UTC

dead-lock at HTable flusCommits with multiple clients...

I am importing data into Hbase with a client running 10 threads.  I
explicitly call 'flushCommit' from each thread (after a few thousand puts)

Here is the thread-dump:

"pool-1-thread-20" prio=10 tid=0x0000000041072800 nid=0x17d8 in
Object.wait() [0x00007fdaee6c8000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
        - locked <0x00007fdb26342780> (a
org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
        at $Proxy0.put(Unknown Source)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1243)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1241)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1240)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1162)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1248)
        at
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
        - locked <0x00007fdafc35da30> (a
org.apache.hadoop.hbase.client.HTable)

"pool-1-thread-19" prio=10 tid=0x000000004100c800 nid=0x17cd in
Object.wait() [0x00007fdaee7c9000]
   java.lang.Thread.State: WAITING (on object monitor)
        at java.lang.Object.wait(Native Method)
        at java.lang.Object.wait(Object.java:485)
        at
org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
        - locked <0x00007fdb25b487c0> (a
org.apache.hadoop.hbase.ipc.HBaseClient$Call)
        at
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
        at $Proxy0.put(Unknown Source)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1243)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1241)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1240)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1162)
        at
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1248)
        at
org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
        - locked <0x00007fdafc69d290> (a
org.apache.hadoop.hbase.client.HTable)


and so on...

Is this is a known issue?  otherwise I can open a ticket.

thanks
Sujee

http://sujee.net

Re: dead-lock at HTable flusCommits with multiple clients...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Ryan previously said that there must be something interesting going on
in the region server logs, and by looking at your code I'm convinced
that you will indeed find an answer to your slowness. Do look at them!
He also talked about the 1 RPC thread per client, so multiple JVMs
should be faster. And for the ZK connection, that's ok since there's
almost no traffic going there.

So each value that you insert is exactly 3.4k, which is much higher
than what the default configurations are tuned for. I bet you will see
a lot of log rolling happening, flushes, compactions, splits, etc.
Those all take time and, when you hit some blocks that are in place,
it stops the clients from inserting until some condition happens.

See in these slides the settings we used for our initial import at
StumbleUpon: http://people.apache.org/~jdcryans/HUG8/HUG8-rawson.pdf.
The blocking store files and memstore block multiplier are important
if you have machines that can support it (also don't forget to give
more heap, you won't achieve anything with 1GB). And with such big
values, setting hbase.regionserver.hlog.blocksize toi something higher
than 64MB probably makes a lot of sense. And maybe set your
MAX_FILESIZE on your table to something bigger than 256MB.

Also in your code I don't see you calling htable.flushCommit, so you
are probably missing edits after the insert.

Finally, what are you trying to do? Your initial data import? If so,
there are better solutions like the HFileOutputFormat. Or are you
testing the max import speed you can get? Then you probably should let
your table "warm up" by letting your script run a first time to create
> 100 regions in order to get better load distribution else it will
hit the too few regions during the first phase of the upload.

J-D

On Thu, Jun 17, 2010 at 4:51 PM, Sujee Maniyam <su...@sujee.net> wrote:
> Following up on this:
>
> Here is my sample code to reproduce the issue:
> http://pastebin.com/vTX8Pu7c
>
> I am importing data from a single JVM, using multiple threads (10).
>
> Each thread creates its own instance of HTable.  But I see only one 'zoo
> keeper connection' in the output.  Is that right?
>
> For the same import code, my throughput is cut by factor of 4,  going from
> 0.23  to 0.24.  The current write-speed is a bit slow for our needs.
>
> 1) is there any parameters I can tweak?  I have already disabled 'auto
> flush'
> 2) if multi-threaded-write isn't going to be effective, should I consider
> doing multiple JVM processes
>
> thanks
> Sujee
>
> http://sujee.net
>
>
> On Thu, Jun 10, 2010 at 3:41 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:
>
>> Also 0.20.4 has the ExplicitColumnTracker that spins in a infinite
>> loop in some situations.
>>
>> J-D
>>
>> On Thu, Jun 10, 2010 at 3:38 PM, Ryan Rawson <ry...@gmail.com> wrote:
>> > hey,
>> >
>> > so you have discovered a particular 'trick' about how the HBase RPC
>> > works... at the lowest level there is only 1 socket for every thread
>> > to talk to all regionservers.  Thus if you are sending a large amount
>> > of data to HBase you can see this bottlenecking.
>> >
>> > It is highly likely there might be something interesting in the
>> > HRegionServer logs, perhaps the regionserver is blocking because it's
>> > trying to keep from being overrun (we ship with very conservative
>> > defaults).  There was a recent thread about this too... the thread was
>> > titled "ideas to improve throughput of the base writting".
>> >
>> > -ryan
>> >
>> >
>> > On Thu, Jun 10, 2010 at 3:17 PM, Sujee Maniyam <su...@sujee.net> wrote:
>> >> forgot to mention, that I am using hbase 0.20.4
>> >>
>> >
>>
>

Re: dead-lock at HTable flusCommits with multiple clients...

Posted by Sujee Maniyam <su...@sujee.net>.
Following up on this:

Here is my sample code to reproduce the issue:
http://pastebin.com/vTX8Pu7c

I am importing data from a single JVM, using multiple threads (10).

Each thread creates its own instance of HTable.  But I see only one 'zoo
keeper connection' in the output.  Is that right?

For the same import code, my throughput is cut by factor of 4,  going from
0.23  to 0.24.  The current write-speed is a bit slow for our needs.

1) is there any parameters I can tweak?  I have already disabled 'auto
flush'
2) if multi-threaded-write isn't going to be effective, should I consider
doing multiple JVM processes

thanks
Sujee

http://sujee.net


On Thu, Jun 10, 2010 at 3:41 PM, Jean-Daniel Cryans <jd...@apache.org>wrote:

> Also 0.20.4 has the ExplicitColumnTracker that spins in a infinite
> loop in some situations.
>
> J-D
>
> On Thu, Jun 10, 2010 at 3:38 PM, Ryan Rawson <ry...@gmail.com> wrote:
> > hey,
> >
> > so you have discovered a particular 'trick' about how the HBase RPC
> > works... at the lowest level there is only 1 socket for every thread
> > to talk to all regionservers.  Thus if you are sending a large amount
> > of data to HBase you can see this bottlenecking.
> >
> > It is highly likely there might be something interesting in the
> > HRegionServer logs, perhaps the regionserver is blocking because it's
> > trying to keep from being overrun (we ship with very conservative
> > defaults).  There was a recent thread about this too... the thread was
> > titled "ideas to improve throughput of the base writting".
> >
> > -ryan
> >
> >
> > On Thu, Jun 10, 2010 at 3:17 PM, Sujee Maniyam <su...@sujee.net> wrote:
> >> forgot to mention, that I am using hbase 0.20.4
> >>
> >
>

Re: dead-lock at HTable flusCommits with multiple clients...

Posted by Jean-Daniel Cryans <jd...@apache.org>.
Also 0.20.4 has the ExplicitColumnTracker that spins in a infinite
loop in some situations.

J-D

On Thu, Jun 10, 2010 at 3:38 PM, Ryan Rawson <ry...@gmail.com> wrote:
> hey,
>
> so you have discovered a particular 'trick' about how the HBase RPC
> works... at the lowest level there is only 1 socket for every thread
> to talk to all regionservers.  Thus if you are sending a large amount
> of data to HBase you can see this bottlenecking.
>
> It is highly likely there might be something interesting in the
> HRegionServer logs, perhaps the regionserver is blocking because it's
> trying to keep from being overrun (we ship with very conservative
> defaults).  There was a recent thread about this too... the thread was
> titled "ideas to improve throughput of the base writting".
>
> -ryan
>
>
> On Thu, Jun 10, 2010 at 3:17 PM, Sujee Maniyam <su...@sujee.net> wrote:
>> forgot to mention, that I am using hbase 0.20.4
>>
>

Re: dead-lock at HTable flusCommits with multiple clients...

Posted by Ryan Rawson <ry...@gmail.com>.
hey,

so you have discovered a particular 'trick' about how the HBase RPC
works... at the lowest level there is only 1 socket for every thread
to talk to all regionservers.  Thus if you are sending a large amount
of data to HBase you can see this bottlenecking.

It is highly likely there might be something interesting in the
HRegionServer logs, perhaps the regionserver is blocking because it's
trying to keep from being overrun (we ship with very conservative
defaults).  There was a recent thread about this too... the thread was
titled "ideas to improve throughput of the base writting".

-ryan


On Thu, Jun 10, 2010 at 3:17 PM, Sujee Maniyam <su...@sujee.net> wrote:
> forgot to mention, that I am using hbase 0.20.4
>

Re: dead-lock at HTable flusCommits with multiple clients...

Posted by Sujee Maniyam <su...@sujee.net>.
forgot to mention, that I am using hbase 0.20.4

Re: dead-lock at HTable flusCommits with multiple clients...

Posted by Sujee Maniyam <su...@sujee.net>.
more log : http://pastebin.com/nVYdJb3v

HTable is not shared, each thread creates its own HTable instance.

the import goes along for a few minutes.  I can see 'requests' on HBase
Master UI.  Then the client hangs, requests become zero.

http://sujee.net

RE: dead-lock at HTable flusCommits with multiple clients...

Posted by Jonathan Gray <jg...@facebook.com>.
HTable is explicitly not thread-safe when using the write buffer.

> -----Original Message-----
> From: Ryan Rawson [mailto:ryanobjc@gmail.com]
> Sent: Thursday, June 10, 2010 2:34 PM
> To: user@hbase.apache.org
> Cc: hbase-user
> Subject: Re: dead-lock at HTable flusCommits with multiple clients...
> 
> I'd would need to see more of jstack to feel comfortable on calling
> this a deadlock... However, HTable is not designed to be used by
> multiple threads.  Each thread should have it's own copy of HTable.
> 
> -ryan
> 
> On Thu, Jun 10, 2010 at 2:26 PM, Sujee Maniyam <su...@sujee.net> wrote:
> > I am importing data into Hbase with a client running 10 threads.  I
> > explicitly call 'flushCommit' from each thread (after a few thousand
> puts)
> >
> > Here is the thread-dump:
> >
> > "pool-1-thread-20" prio=10 tid=0x0000000041072800 nid=0x17d8 in
> > Object.wait() [0x00007fdaee6c8000]
> >   java.lang.Thread.State: WAITING (on object monitor)
> >        at java.lang.Object.wait(Native Method)
> >        at java.lang.Object.wait(Object.java:485)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
> >        - locked <0x00007fdb26342780> (a
> > org.apache.hadoop.hbase.ipc.HBaseClient$Call)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
> >        at $Proxy0.put(Unknown Source)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call
> (HConnectionManager.java:1243)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call
> (HConnectionManager.java:1241)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegio
> nServerWithRetries(HConnectionManager.java:1050)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall
> (HConnectionManager.java:1240)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.pr
> ocess(HConnectionManager.java:1162)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processB
> atchOfRows(HConnectionManager.java:1248)
> >        at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
> >        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
> >        - locked <0x00007fdafc35da30> (a
> > org.apache.hadoop.hbase.client.HTable)
> >
> > "pool-1-thread-19" prio=10 tid=0x000000004100c800 nid=0x17cd in
> > Object.wait() [0x00007fdaee7c9000]
> >   java.lang.Thread.State: WAITING (on object monitor)
> >        at java.lang.Object.wait(Native Method)
> >        at java.lang.Object.wait(Object.java:485)
> >        at
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
> >        - locked <0x00007fdb25b487c0> (a
> > org.apache.hadoop.hbase.ipc.HBaseClient$Call)
> >        at
> >
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
> >        at $Proxy0.put(Unknown Source)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call
> (HConnectionManager.java:1243)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call
> (HConnectionManager.java:1241)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegio
> nServerWithRetries(HConnectionManager.java:1050)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall
> (HConnectionManager.java:1240)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.pr
> ocess(HConnectionManager.java:1162)
> >        at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processB
> atchOfRows(HConnectionManager.java:1248)
> >        at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
> >        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
> >        - locked <0x00007fdafc69d290> (a
> > org.apache.hadoop.hbase.client.HTable)
> >
> >
> > and so on...
> >
> > Is this is a known issue?  otherwise I can open a ticket.
> >
> > thanks
> > Sujee
> >
> > http://sujee.net
> >

Re: dead-lock at HTable flusCommits with multiple clients...

Posted by Ryan Rawson <ry...@gmail.com>.
I'd would need to see more of jstack to feel comfortable on calling
this a deadlock... However, HTable is not designed to be used by
multiple threads.  Each thread should have it's own copy of HTable.

-ryan

On Thu, Jun 10, 2010 at 2:26 PM, Sujee Maniyam <su...@sujee.net> wrote:
> I am importing data into Hbase with a client running 10 threads.  I
> explicitly call 'flushCommit' from each thread (after a few thousand puts)
>
> Here is the thread-dump:
>
> "pool-1-thread-20" prio=10 tid=0x0000000041072800 nid=0x17d8 in
> Object.wait() [0x00007fdaee6c8000]
>   java.lang.Thread.State: WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        at java.lang.Object.wait(Object.java:485)
>        at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
>        - locked <0x00007fdb26342780> (a
> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>        at $Proxy0.put(Unknown Source)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1243)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1241)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1240)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1162)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1248)
>        at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
>        - locked <0x00007fdafc35da30> (a
> org.apache.hadoop.hbase.client.HTable)
>
> "pool-1-thread-19" prio=10 tid=0x000000004100c800 nid=0x17cd in
> Object.wait() [0x00007fdaee7c9000]
>   java.lang.Thread.State: WAITING (on object monitor)
>        at java.lang.Object.wait(Native Method)
>        at java.lang.Object.wait(Object.java:485)
>        at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:721)
>        - locked <0x00007fdb25b487c0> (a
> org.apache.hadoop.hbase.ipc.HBaseClient$Call)
>        at
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:333)
>        at $Proxy0.put(Unknown Source)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1243)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3$1.call(HConnectionManager.java:1241)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionServerWithRetries(HConnectionManager.java:1050)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$3.doCall(HConnectionManager.java:1240)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers$Batch.process(HConnectionManager.java:1162)
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.processBatchOfRows(HConnectionManager.java:1248)
>        at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:666)
>        at org.apache.hadoop.hbase.client.HTable.put(HTable.java:510)
>        - locked <0x00007fdafc69d290> (a
> org.apache.hadoop.hbase.client.HTable)
>
>
> and so on...
>
> Is this is a known issue?  otherwise I can open a ticket.
>
> thanks
> Sujee
>
> http://sujee.net
>