You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Adam Silberstein <si...@yahoo-inc.com> on 2009/12/06 00:32:21 UTC

Problems with read ops when table size is large

Hi,
I¹m having problems doing client operations when my table is large.  I did
an initial test like this:
6 servers
6 GB heap size per server
20 million 1K recs (so ~3 GB per server)

I was able to do at least 5,000 random read/write operations per second.

I think increased my table size to
120 million 1K recs (so ~20 GB per server)

I then put a very light load of random reads on the table: 20 reads per
second.  I¹m able to do a few, but within 10-20 seconds, they all fail.  I
found many errors of the following type in the hbase master log:

java.io.IOException: java.io.IOException: Could not obtain block:
blk_-7409743019137510182_39869
file=/hbase/.META./1028785192/info/2540865741541403627

If I wait about 5 minutes, I can repeat this sequence (do a few operations,
then get errors).

If anyone has any suggestions or needs me to list particular settings, let
me know.  The odd thing is that I observe no problems and great performance
with a smaller table.

Thanks,
Adam

Re: Problems with read ops when table size is large

Posted by Andrew Purtell <ap...@apache.org>.

See http://issues.apache.org/jira/browse/HDFS-223 (Asynchronous IO Handling in Hadoop and HDFS) for background. As the data grows the xcievers setting the cluster requires transitions upwards. When I build a new cluster I start with dfs.datanode.max.xcievers=4096, which provides a lot of headroom for growth. On another JIRA from the 0.18 timeframe the HDFS guys suggest starting with 2K. 

   - Andy




________________________________
From: Ryan Rawson <ry...@gmail.com>
To: hbase-user@hadoop.apache.org
Sent: Sat, December 5, 2009 11:17:37 PM
Subject: Re: Problems with read ops when table size is large

Looks like your problem is #4:  "dfs.datanode.max.xcievers"

You need to set this in hadoop/conf/hdfs-site.xml

Change this and I'm sure your experience will improve.

Good luck.
-ryan

On Sat, Dec 5, 2009 at 11:07 PM, Adam Silberstein
<si...@yahoo-inc.com> wrote:
> Thanks for the suggestions.  Let me run down what I tried:
> 1. My ulimit was already much higher than 1024, so no change there.
> 2. I was not using hdfs-127.  I switched to that.  I didn't use M/R to do my
> initial load, by the way.
> 3. I was a little unclear on which handler counts to increase and to what.
> I changed hbase.regionserver.handler.count, dfs.namenode.handler.count, and
> dfs.datanode.handler.count all from 10 to 100.
> 4. I did see the error that I was exceeding the dfs.datanode.max.xcievers
> value 256.  What's odd is that I have that set to ~3000, but it's apparently
> not getting picked up by hdfs when it starts.  Any ideas there (like is it
> really xceivers)?
> 5. I'm not sure how many regions per regionserver.  What's a good way to
> check that.
> 6. Didn't get to checking for missing block.
>
> Ultimately, either #2 or #3 or both helped.  I was able to push throughput
> way up without seeing the error recur.  So thanks a lot for the help!  I'm
> still interested in getting the best performance possible.  So if you think
> fixing the xciever problem will help, I'd like to spend some more time
> there.
>
> Thanks,
> Adam
>
>
> On 12/5/09 9:38 PM, "stack" <st...@duboce.net> wrote:
>
>> See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6.  Different hdfs
>> complaint but make sure your ulimit is > 1024 (check first or second line in
>> master log -- it prints out what hbase is seeing for ulimit), check that
>> hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH (this
>> is particularly important if your loading script is a mapreduce task,
>> clients might not be seeing the patched hadoop that hbase ships with).  Also
>> up the handler count for hdfs (the referred to timeout is no longer
>> pertinent I believe) and while you are at it, those for hbase if you haven't
>> changed them from defaults.  While you are at it, make sure you don't suffer
>> from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5.
>>
>> How many regions per regionserver?
>>
>> Can you put a regionserver log somewhere I can pull it to take a look?
>>
>> For a "Could not obtain block message", what happens if you take the
>> filename -- 2540865741541403627 in the below -- and grep NameNode.  Does it
>> tell you anything?
>>
>> St.Ack
>>
>> On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein
>> <si...@yahoo-inc.com>wrote:
>>
>>> Hi,
>>> I¹m having problems doing client operations when my table is large.  I did
>>> an initial test like this:
>>> 6 servers
>>> 6 GB heap size per server
>>> 20 million 1K recs (so ~3 GB per server)
>>>
>>> I was able to do at least 5,000 random read/write operations per second.
>>>
>>> I think increased my table size to
>>> 120 million 1K recs (so ~20 GB per server)
>>>
>>> I then put a very light load of random reads on the table: 20 reads per
>>> second.  I¹m able to do a few, but within 10-20 seconds, they all fail.  I
>>> found many errors of the following type in the hbase master log:
>>>
>>> java.io.IOException: java.io.IOException: Could not obtain block:
>>> blk_-7409743019137510182_39869
>>> file=/hbase/.META./1028785192/info/2540865741541403627
>>>
>>> If I wait about 5 minutes, I can repeat this sequence (do a few operations,
>>> then get errors).
>>>
>>> If anyone has any suggestions or needs me to list particular settings, let
>>> me know.  The odd thing is that I observe no problems and great performance
>>> with a smaller table.
>>>
>>> Thanks,
>>> Adam
>>>
>>>
>>>
>
>

Re: Problems with read ops when table size is large

Posted by Ryan Rawson <ry...@gmail.com>.

Looks like your problem is #4:  "dfs.datanode.max.xcievers"

You need to set this in hadoop/conf/hdfs-site.xml

Change this and I'm sure your experience will improve.

Good luck.
-ryan

On Sat, Dec 5, 2009 at 11:07 PM, Adam Silberstein
<si...@yahoo-inc.com> wrote:
> Thanks for the suggestions.  Let me run down what I tried:
> 1. My ulimit was already much higher than 1024, so no change there.
> 2. I was not using hdfs-127.  I switched to that.  I didn't use M/R to do my
> initial load, by the way.
> 3. I was a little unclear on which handler counts to increase and to what.
> I changed hbase.regionserver.handler.count, dfs.namenode.handler.count, and
> dfs.datanode.handler.count all from 10 to 100.
> 4. I did see the error that I was exceeding the dfs.datanode.max.xcievers
> value 256.  What's odd is that I have that set to ~3000, but it's apparently
> not getting picked up by hdfs when it starts.  Any ideas there (like is it
> really xceivers)?
> 5. I'm not sure how many regions per regionserver.  What's a good way to
> check that.
> 6. Didn't get to checking for missing block.
>
> Ultimately, either #2 or #3 or both helped.  I was able to push throughput
> way up without seeing the error recur.  So thanks a lot for the help!  I'm
> still interested in getting the best performance possible.  So if you think
> fixing the xciever problem will help, I'd like to spend some more time
> there.
>
> Thanks,
> Adam
>
>
> On 12/5/09 9:38 PM, "stack" <st...@duboce.net> wrote:
>
>> See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6.  Different hdfs
>> complaint but make sure your ulimit is > 1024 (check first or second line in
>> master log -- it prints out what hbase is seeing for ulimit), check that
>> hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH (this
>> is particularly important if your loading script is a mapreduce task,
>> clients might not be seeing the patched hadoop that hbase ships with).  Also
>> up the handler count for hdfs (the referred to timeout is no longer
>> pertinent I believe) and while you are at it, those for hbase if you haven't
>> changed them from defaults.  While you are at it, make sure you don't suffer
>> from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5.
>>
>> How many regions per regionserver?
>>
>> Can you put a regionserver log somewhere I can pull it to take a look?
>>
>> For a "Could not obtain block message", what happens if you take the
>> filename -- 2540865741541403627 in the below -- and grep NameNode.  Does it
>> tell you anything?
>>
>> St.Ack
>>
>> On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein
>> <si...@yahoo-inc.com>wrote:
>>
>>> Hi,
>>> I¹m having problems doing client operations when my table is large.  I did
>>> an initial test like this:
>>> 6 servers
>>> 6 GB heap size per server
>>> 20 million 1K recs (so ~3 GB per server)
>>>
>>> I was able to do at least 5,000 random read/write operations per second.
>>>
>>> I think increased my table size to
>>> 120 million 1K recs (so ~20 GB per server)
>>>
>>> I then put a very light load of random reads on the table: 20 reads per
>>> second.  I¹m able to do a few, but within 10-20 seconds, they all fail.  I
>>> found many errors of the following type in the hbase master log:
>>>
>>> java.io.IOException: java.io.IOException: Could not obtain block:
>>> blk_-7409743019137510182_39869
>>> file=/hbase/.META./1028785192/info/2540865741541403627
>>>
>>> If I wait about 5 minutes, I can repeat this sequence (do a few operations,
>>> then get errors).
>>>
>>> If anyone has any suggestions or needs me to list particular settings, let
>>> me know.  The odd thing is that I observe no problems and great performance
>>> with a smaller table.
>>>
>>> Thanks,
>>> Adam
>>>
>>>
>>>
>
>

Re: Problems with read ops when table size is large

Posted by stack <st...@duboce.net>.

On Sat, Dec 5, 2009 at 11:07 PM, Adam Silberstein <si...@yahoo-inc.com>wrote:

> 2. I was not using hdfs-127.  I switched to that.  I didn't use M/R to do
> my
> initial load, by the way.
>

HBase needs an hdfs-127 patched DFSClient.  If your "client operations" are
being done from MR then, you probably need to patch your hdfs (Hadoop 0.20.2
and 0.21.0 will ship with hdfs-127 applied).  Otherwise, your hbase client
is probably using the hadoop that hbase ships with.  This already has
hdfs-127 applied.



> 3. I was a little unclear on which handler counts to increase and to what.
> I changed hbase.regionserver.handler.count, dfs.namenode.handler.count, and
> dfs.datanode.handler.count all from 10 to 100.
>

That'll do it.



> 4. I did see the error that I was exceeding the dfs.datanode.max.xcievers
> value 256.  What's odd is that I have that set to ~3000, but it's
> apparently
> not getting picked up by hdfs when it starts.  Any ideas there (like is it
> really xceivers)?
>

Yes.  Its the misspelling.


> 5. I'm not sure how many regions per regionserver.  What's a good way to
> check that.
>

HBase has a UI.  The master shows on port 60010 by default.  The
regionservers on 60030.  The master ui first page lists all participating
servers and what they are carrying among other things.  Number of regions
per server is rough measure of cluster loading.


> 6. Didn't get to checking for missing block.
>
> Ultimately, either #2 or #3 or both helped.  I was able to push throughput
> way up without seeing the error recur.




It was probably #3 but depending on how you were putting up your "client
operations", it would have been #2.

The "Could not obtain block" comes up out of DFSClient.  Here is where the
message comes up from:

    private DNAddrPair chooseDataNode(LocatedBlock block)
      throws IOException {
      while (true) {
        DatanodeInfo[] nodes = block.getLocations();
        try {
          DatanodeInfo chosenNode = bestNode(nodes, deadNodes);
          InetSocketAddress targetAddr =
                            NetUtils.createSocketAddr(chosenNode.getName());
          return new DNAddrPair(chosenNode, targetAddr);
        } catch (IOException ie) {
          String blockInfo = block.getBlock() + " file=" + src;
          if (failures >= maxBlockAcquireFailures) {
            throw new BlockMissingException(src, "Could not obtain block: "
+ blockInfo,
                                            block.getStartOffset());
          }

          if (nodes == null || nodes.length == 0) {
            LOG.info("No node available for block: " + blockInfo);
          }
          LOG.info("Could not obtain block " + block.getBlock()
              + " from any node: " + ie
              + ". Will get new block locations from namenode and
retry...");
          try {
            Thread.sleep(3000);
          } catch (InterruptedException iex) {
          }
          deadNodes.clear(); //2nd option is to remove only nodes[blockId]
          openInfo();
          block = getBlockAt(block.getStartOffset(), false);
          failures++;
          continue;
        }
      }
    }

bestNodes is likely throwing the exception.  It does little but look at the
list of datanodes provided by the namenode and checks if they are in the
dead node list.  Datanodes could be in the deadnode list because of #2 -- we
accumulate failures too easily -- or #3 because we couldn't get on to the
datanode because lots of contention and few handlers.



>  So thanks a lot for the help!  I'm
> still interested in getting the best performance possible.



If a read load, more RAM means more caching so your performance should go
up.  You have lzo enabled?  A few of the lads have it that this is the
single best thing you can do to boost performance.  What is your loading
profile?  Does it have a strong character or is it a mix?  Messing with GC
settings should help too.  If lots of writing, make sure that you are not
seeing write barriers come down on regionservers because flushing is lagging
behind write rates.   See hbase.hregion.memstore.block.multiplier.  If you
have enough RAM perhaps double this.  Make sure write load is not hampered
by this: hbase.hstore.blockingStoreFiles.  Make sure you are not running up
against this limit writing: hbase.regionserver.global.memstore.upperLimit.

St.Ack



>  So if you think
> fixing the xciever problem will help, I'd like to spend some more time
> there.
>
> Thanks,
> Adam
>
>
> On 12/5/09 9:38 PM, "stack" <st...@duboce.net> wrote:
>
> > See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6.  Different
> hdfs
> > complaint but make sure your ulimit is > 1024 (check first or second line
> in
> > master log -- it prints out what hbase is seeing for ulimit), check that
> > hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH
> (this
> > is particularly important if your loading script is a mapreduce task,
> > clients might not be seeing the patched hadoop that hbase ships with).
>  Also
> > up the handler count for hdfs (the referred to timeout is no longer
> > pertinent I believe) and while you are at it, those for hbase if you
> haven't
> > changed them from defaults.  While you are at it, make sure you don't
> suffer
> > from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5.
> >
> > How many regions per regionserver?
> >
> > Can you put a regionserver log somewhere I can pull it to take a look?
> >
> > For a "Could not obtain block message", what happens if you take the
> > filename -- 2540865741541403627 in the below -- and grep NameNode.  Does
> it
> > tell you anything?
> >
> > St.Ack
> >
> > On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein
> > <si...@yahoo-inc.com>wrote:
> >
> >> Hi,
> >> I¹m having problems doing client operations when my table is large.  I
> did
> >> an initial test like this:
> >> 6 servers
> >> 6 GB heap size per server
> >> 20 million 1K recs (so ~3 GB per server)
> >>
> >> I was able to do at least 5,000 random read/write operations per second.
> >>
> >> I think increased my table size to
> >> 120 million 1K recs (so ~20 GB per server)
> >>
> >> I then put a very light load of random reads on the table: 20 reads per
> >> second.  I¹m able to do a few, but within 10-20 seconds, they all fail.
>  I
> >> found many errors of the following type in the hbase master log:
> >>
> >> java.io.IOException: java.io.IOException: Could not obtain block:
> >> blk_-7409743019137510182_39869
> >> file=/hbase/.META./1028785192/info/2540865741541403627
> >>
> >> If I wait about 5 minutes, I can repeat this sequence (do a few
> operations,
> >> then get errors).
> >>
> >> If anyone has any suggestions or needs me to list particular settings,
> let
> >> me know.  The odd thing is that I observe no problems and great
> performance
> >> with a smaller table.
> >>
> >> Thanks,
> >> Adam
> >>
> >>
> >>
>
>

Re: Problems with read ops when table size is large

Posted by Adam Silberstein <si...@yahoo-inc.com>.

Thanks for the suggestions.  Let me run down what I tried:
1. My ulimit was already much higher than 1024, so no change there.
2. I was not using hdfs-127.  I switched to that.  I didn't use M/R to do my
initial load, by the way.
3. I was a little unclear on which handler counts to increase and to what.
I changed hbase.regionserver.handler.count, dfs.namenode.handler.count, and
dfs.datanode.handler.count all from 10 to 100.
4. I did see the error that I was exceeding the dfs.datanode.max.xcievers
value 256.  What's odd is that I have that set to ~3000, but it's apparently
not getting picked up by hdfs when it starts.  Any ideas there (like is it
really xceivers)?
5. I'm not sure how many regions per regionserver.  What's a good way to
check that.
6. Didn't get to checking for missing block.

Ultimately, either #2 or #3 or both helped.  I was able to push throughput
way up without seeing the error recur.  So thanks a lot for the help!  I'm
still interested in getting the best performance possible.  So if you think
fixing the xciever problem will help, I'd like to spend some more time
there.  

Thanks,
Adam

On 12/5/09 9:38 PM, "stack" <st...@duboce.net> wrote:

> See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6.  Different hdfs
> complaint but make sure your ulimit is > 1024 (check first or second line in
> master log -- it prints out what hbase is seeing for ulimit), check that
> hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH (this
> is particularly important if your loading script is a mapreduce task,
> clients might not be seeing the patched hadoop that hbase ships with).  Also
> up the handler count for hdfs (the referred to timeout is no longer
> pertinent I believe) and while you are at it, those for hbase if you haven't
> changed them from defaults.  While you are at it, make sure you don't suffer
> from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5.
> 
> How many regions per regionserver?
> 
> Can you put a regionserver log somewhere I can pull it to take a look?
> 
> For a "Could not obtain block message", what happens if you take the
> filename -- 2540865741541403627 in the below -- and grep NameNode.  Does it
> tell you anything?
> 
> St.Ack
> 
> On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein
> <si...@yahoo-inc.com>wrote:
> 
>> Hi,
>> I¹m having problems doing client operations when my table is large.  I did
>> an initial test like this:
>> 6 servers
>> 6 GB heap size per server
>> 20 million 1K recs (so ~3 GB per server)
>> 
>> I was able to do at least 5,000 random read/write operations per second.
>> 
>> I think increased my table size to
>> 120 million 1K recs (so ~20 GB per server)
>> 
>> I then put a very light load of random reads on the table: 20 reads per
>> second.  I¹m able to do a few, but within 10-20 seconds, they all fail.  I
>> found many errors of the following type in the hbase master log:
>> 
>> java.io.IOException: java.io.IOException: Could not obtain block:
>> blk_-7409743019137510182_39869
>> file=/hbase/.META./1028785192/info/2540865741541403627
>> 
>> If I wait about 5 minutes, I can repeat this sequence (do a few operations,
>> then get errors).
>> 
>> If anyone has any suggestions or needs me to list particular settings, let
>> me know.  The odd thing is that I observe no problems and great performance
>> with a smaller table.
>> 
>> Thanks,
>> Adam
>> 
>> 
>>

Re: Problems with read ops when table size is large

Posted by stack <st...@duboce.net>.

See http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A6.  Different hdfs
complaint but make sure your ulimit is > 1024 (check first or second line in
master log -- it prints out what hbase is seeing for ulimit), check that
hdfs-127 is applied to the first hadoop that hbase sees on CLASSPATH (this
is particularly important if your loading script is a mapreduce task,
clients might not be seeing the patched hadoop that hbase ships with).  Also
up the handler count for hdfs (the referred to timeout is no longer
pertinent I believe) and while you are at it, those for hbase if you haven't
changed them from defaults.  While you are at it, make sure you don't suffer
from http://wiki.apache.org/hadoop/Hbase/Troubleshooting#A5.

How many regions per regionserver?

Can you put a regionserver log somewhere I can pull it to take a look?

For a "Could not obtain block message", what happens if you take the
filename -- 2540865741541403627 in the below -- and grep NameNode.  Does it
tell you anything?

St.Ack

On Sat, Dec 5, 2009 at 3:32 PM, Adam Silberstein <si...@yahoo-inc.com>wrote:

> Hi,
> I¹m having problems doing client operations when my table is large.  I did
> an initial test like this:
> 6 servers
> 6 GB heap size per server
> 20 million 1K recs (so ~3 GB per server)
>
> I was able to do at least 5,000 random read/write operations per second.
>
> I think increased my table size to
> 120 million 1K recs (so ~20 GB per server)
>
> I then put a very light load of random reads on the table: 20 reads per
> second.  I¹m able to do a few, but within 10-20 seconds, they all fail.  I
> found many errors of the following type in the hbase master log:
>
> java.io.IOException: java.io.IOException: Could not obtain block:
> blk_-7409743019137510182_39869
> file=/hbase/.META./1028785192/info/2540865741541403627
>
> If I wait about 5 minutes, I can repeat this sequence (do a few operations,
> then get errors).
>
> If anyone has any suggestions or needs me to list particular settings, let
> me know.  The odd thing is that I observe no problems and great performance
> with a smaller table.
>
> Thanks,
> Adam
>
>
>