You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hbase.apache.org by Peter Haidinyak <ph...@local.com> on 2011/04/22 19:40:51 UTC

Row Key Question

I have a question on how HBase decides to save rows based on Row Keys. Say I have a million rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte value and there are two columns per row, each column contains some random 32 byte value. 
My question is how does HBase know when to 'split' the table between the ten nodes? Or how does HBase 'split' the random keys between the ten nodes? 

Thanks

-Pete

Re: Row Key Question

Posted by Jean-Daniel Cryans <jd...@apache.org>.

The splitting is based on when a region reaches a configured size
(default is 256MB). A table starts with 1 region, and splits as needed
when you insert. For a bit more info see:
http://hbase.apache.org/book.html#regions.arch

J-D

On Fri, Apr 22, 2011 at 10:40 AM, Peter Haidinyak <ph...@local.com> wrote:
> I have a question on how HBase decides to save rows based on Row Keys. Say I have a million rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte value and there are two columns per row, each column contains some random 32 byte value.
> My question is how does HBase know when to 'split' the table between the ten nodes? Or how does HBase 'split' the random keys between the ten nodes?
>
> Thanks
>
> -Pete
>

Re: Splitlog() executed while the namenode was in safemode may cause data-loss

Posted by Stack <st...@duboce.net>.

That sounds like a good idea.   If you don't mind, please file an
issue and make a patch.
Thank you,
St.Ack

On Sun, Apr 24, 2011 at 5:53 PM, bijieshan <bi...@huawei.com> wrote:
> Under the current hdfs Version, there's no related method to judge whether the namenode is in safemode.
> Maybe we can handle the SafeModeException at the top layer where called the method of checkFileSystem(),  wait for a while and retry the operation.
> Does that reasonable? I hope someone can give some advises.
>
> Thanks!
> Jieshan Bean
>
> -----邮件原件-----
> 发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
> 发送时间: 2011年4月24日 5:55
> 收件人: user@hbase.apache.org
> 抄送: Chenjian
> 主题: Re: Splitlog() executed while the namenode was in safemode may cause data-loss
>
> Sorry, what did you change?
> Thanks,
> St.Ack
>
> On Fri, Apr 22, 2011 at 9:00 PM, bijieshan <bi...@huawei.com> wrote:
>> Hi,
>> I found this problem while the namenode went into safemode due to some unclear reasons.
>> There's one patch about this problem:
>>
>>   try {
>>      HLogSplitter splitter = HLogSplitter.createLogSplitter(
>>        conf, rootdir, logDir, oldLogDir, this.fs);
>>      try {
>>        splitter.splitLog();
>>      } catch (OrphanHLogAfterSplitException e) {
>>        LOG.warn("Retrying splitting because of:", e);
>>        // An HLogSplitter instance can only be used once.  Get new instance.
>>        splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
>>          oldLogDir, this.fs);
>>        splitter.splitLog();
>>      }
>>      splitTime = splitter.getTime();
>>      splitLogSize = splitter.getSize();
>>    } catch (IOException e) {
>>      checkFileSystem();
>>      LOG.error("Failed splitting " + logDir.toString(), e);
>>      master.abort("Shutting down HBase cluster: Failed splitting hlog files...", e);
>>    } finally {
>>      this.splitLogLock.unlock();
>>    }
>>
>> And it was really give some useful help to some extent, while the namenode process exited or been killed, but not considered the Namenode safemode exception.
>>   I think the root reason is the method of checkFileSystem().
>>   It gives out an method to check whether the HDFS works normally(Read and write could be success), and that maybe the original propose of this method. This's how this method implements:
>>
>>    DistributedFileSystem dfs = (DistributedFileSystem) fs;
>>    try {
>>      if (dfs.exists(new Path("/"))) {
>>        return;
>>      }
>>    } catch (IOException e) {
>>      exception = RemoteExceptionHandler.checkIOException(e);
>>    }
>>
>>   I have check the hdfs code, and learned that while the namenode was in safemode ,the dfs.exists(new Path("/")) returned true. Because the file system could provide read-only service. So this method just checks the dfs whether could be read. I
>> think it's not reasonable.
>>
>>
>> Regards,
>> Jieshan Bean
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>

Re: Splitlog() executed while the namenode was in safemode may cause data-loss

Posted by bijieshan <bi...@huawei.com>.

Under the current hdfs Version, there's no related method to judge whether the namenode is in safemode.
Maybe we can handle the SafeModeException at the top layer where called the method of checkFileSystem(),  wait for a while and retry the operation.
Does that reasonable? I hope someone can give some advises.

Thanks!
Jieshan Bean

-----邮件原件-----
发件人: saint.ack@gmail.com [mailto:saint.ack@gmail.com] 代表 Stack
发送时间: 2011年4月24日 5:55
收件人: user@hbase.apache.org
抄送: Chenjian
主题: Re: Splitlog() executed while the namenode was in safemode may cause data-loss

Sorry, what did you change?
Thanks,
St.Ack

On Fri, Apr 22, 2011 at 9:00 PM, bijieshan <bi...@huawei.com> wrote:
> Hi,
> I found this problem while the namenode went into safemode due to some unclear reasons.
> There's one patch about this problem:
>
>   try {
>      HLogSplitter splitter = HLogSplitter.createLogSplitter(
>        conf, rootdir, logDir, oldLogDir, this.fs);
>      try {
>        splitter.splitLog();
>      } catch (OrphanHLogAfterSplitException e) {
>        LOG.warn("Retrying splitting because of:", e);
>        // An HLogSplitter instance can only be used once.  Get new instance.
>        splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
>          oldLogDir, this.fs);
>        splitter.splitLog();
>      }
>      splitTime = splitter.getTime();
>      splitLogSize = splitter.getSize();
>    } catch (IOException e) {
>      checkFileSystem();
>      LOG.error("Failed splitting " + logDir.toString(), e);
>      master.abort("Shutting down HBase cluster: Failed splitting hlog files...", e);
>    } finally {
>      this.splitLogLock.unlock();
>    }
>
> And it was really give some useful help to some extent, while the namenode process exited or been killed, but not considered the Namenode safemode exception.
>   I think the root reason is the method of checkFileSystem().
>   It gives out an method to check whether the HDFS works normally(Read and write could be success), and that maybe the original propose of this method. This's how this method implements:
>
>    DistributedFileSystem dfs = (DistributedFileSystem) fs;
>    try {
>      if (dfs.exists(new Path("/"))) {
>        return;
>      }
>    } catch (IOException e) {
>      exception = RemoteExceptionHandler.checkIOException(e);
>    }
>
>   I have check the hdfs code, and learned that while the namenode was in safemode ,the dfs.exists(new Path("/")) returned true. Because the file system could provide read-only service. So this method just checks the dfs whether could be read. I
> think it's not reasonable.
>
>
> Regards,
> Jieshan Bean
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: Splitlog() executed while the namenode was in safemode may cause data-loss

Posted by Stack <st...@duboce.net>.

Sorry, what did you change?
Thanks,
St.Ack

On Fri, Apr 22, 2011 at 9:00 PM, bijieshan <bi...@huawei.com> wrote:
> Hi,
> I found this problem while the namenode went into safemode due to some unclear reasons.
> There's one patch about this problem:
>
>   try {
>      HLogSplitter splitter = HLogSplitter.createLogSplitter(
>        conf, rootdir, logDir, oldLogDir, this.fs);
>      try {
>        splitter.splitLog();
>      } catch (OrphanHLogAfterSplitException e) {
>        LOG.warn("Retrying splitting because of:", e);
>        // An HLogSplitter instance can only be used once.  Get new instance.
>        splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
>          oldLogDir, this.fs);
>        splitter.splitLog();
>      }
>      splitTime = splitter.getTime();
>      splitLogSize = splitter.getSize();
>    } catch (IOException e) {
>      checkFileSystem();
>      LOG.error("Failed splitting " + logDir.toString(), e);
>      master.abort("Shutting down HBase cluster: Failed splitting hlog files...", e);
>    } finally {
>      this.splitLogLock.unlock();
>    }
>
> And it was really give some useful help to some extent, while the namenode process exited or been killed, but not considered the Namenode safemode exception.
>   I think the root reason is the method of checkFileSystem().
>   It gives out an method to check whether the HDFS works normally(Read and write could be success), and that maybe the original propose of this method. This's how this method implements:
>
>    DistributedFileSystem dfs = (DistributedFileSystem) fs;
>    try {
>      if (dfs.exists(new Path("/"))) {
>        return;
>      }
>    } catch (IOException e) {
>      exception = RemoteExceptionHandler.checkIOException(e);
>    }
>
>   I have check the hdfs code, and learned that while the namenode was in safemode ,the dfs.exists(new Path("/")) returned true. Because the file system could provide read-only service. So this method just checks the dfs whether could be read. I
> think it's not reasonable.
>
>
> Regards,
> Jieshan Bean
>
>
>
>
>
>
>
>
>
>
>
>
>

Splitlog() executed while the namenode was in safemode may cause data-loss

Posted by bijieshan <bi...@huawei.com>.

Hi,
I found this problem while the namenode went into safemode due to some unclear reasons. 
There's one patch about this problem:

   try {
      HLogSplitter splitter = HLogSplitter.createLogSplitter(
        conf, rootdir, logDir, oldLogDir, this.fs);
      try {
        splitter.splitLog();
      } catch (OrphanHLogAfterSplitException e) {
        LOG.warn("Retrying splitting because of:", e);
        // An HLogSplitter instance can only be used once.  Get new instance.
        splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
          oldLogDir, this.fs);
        splitter.splitLog();
      }
      splitTime = splitter.getTime();
      splitLogSize = splitter.getSize();
    } catch (IOException e) {
      checkFileSystem();
      LOG.error("Failed splitting " + logDir.toString(), e);
      master.abort("Shutting down HBase cluster: Failed splitting hlog files...", e);
    } finally {
      this.splitLogLock.unlock();
    }

And it was really give some useful help to some extent, while the namenode process exited or been killed, but not considered the Namenode safemode exception.
   I think the root reason is the method of checkFileSystem().
   It gives out an method to check whether the HDFS works normally(Read and write could be success), and that maybe the original propose of this method. This's how this method implements:

    DistributedFileSystem dfs = (DistributedFileSystem) fs;
    try {
      if (dfs.exists(new Path("/"))) {  
        return;
      }
    } catch (IOException e) {
      exception = RemoteExceptionHandler.checkIOException(e);
    }
   
   I have check the hdfs code, and learned that while the namenode was in safemode ,the dfs.exists(new Path("/")) returned true. Because the file system could provide read-only service. So this method just checks the dfs whether could be read. I
think it's not reasonable.
    
    
Regards,
Jieshan Bean

RE: Row Key Question

Posted by Peter Haidinyak <ph...@local.com>.

Thanks for the link, nice doodles :-) He kind of validates my thoughts, sequential key = BAD, but if you must do it use a prefix. I'm hoping that over time the keys will end up having a better distribution and I can still do a scan using a start and end row. I'll see how it distributes on my home system.

-Pete


-----Original Message-----
From: jdcryans@gmail.com [mailto:jdcryans@gmail.com] On Behalf Of Jean-Daniel Cryans
Sent: Friday, April 22, 2011 1:32 PM
To: user@hbase.apache.org
Subject: Re: Row Key Question

That's almost exactly what mozilla is doing with sorocco (google for
their presentations).

Also you seem to assume things about the region balancer that are, at
least at the moment, untrue:

> Then the assumption is this process would continue until every server in the cluster has on region of data

That's more like the end result rather than the goal.

> Then during retrieval I could the use ten Threads, each would use a Start and End row with their prefix and the query should be distributed evenly out among the server.

Nothing is done to make sure that your regions will be distributed
that way, the last region for each salt key may very well end up on
the same region server. That's why it's better to use more salting.

And have you seen this?
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/

J-D

On Fri, Apr 22, 2011 at 1:18 PM, Peter Haidinyak <ph...@local.com> wrote:
> Thanks, that's the way I visualized it happening. Then the assumption is this process would continue until every server in the cluster has on region of data (more or less). My underlying question is that I need to store my data with the key starting with the date (YYYY-MM-DD). I know this means I will have hot spots during inserts but make retrieval more efficient by using a scan with start and end rows. I was thinking of adding a prefix number of 00 to 09, for the ten servers. In theory, each server should only end up with one of the prefixes. Then during retrieval I could the use ten Threads, each would use a Start and End row with their prefix and the query should be distributed evenly out among the server. I'm not sure if using ten Thread to insert the data would buy me anything or not. Anyway, I'm going to try this out at home on my own cluster to see how it performs.
>
> Thanks
>
> -Pete
>
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Friday, April 22, 2011 12:10 PM
> To: user@hbase.apache.org
> Subject: RE: Row Key Question
>
> Regions split when they are larger than the configuration parameter region size.  Your data is small enough to fit on a single region.
>
> Keys are sorted in a region.  When a region splits the new regions are about half the size of the original region, and contain half the key space each.
>
> Dave
>
> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Friday, April 22, 2011 10:41 AM
> To: user@hbase.apache.org
> Subject: Row Key Question
>
> I have a question on how HBase decides to save rows based on Row Keys. Say I have a million rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte value and there are two columns per row, each column contains some random 32 byte value.
> My question is how does HBase know when to 'split' the table between the ten nodes? Or how does HBase 'split' the random keys between the ten nodes?
>
> Thanks
>
> -Pete
>

Re: Row Key Question

Posted by Jean-Daniel Cryans <jd...@apache.org>.

That's almost exactly what mozilla is doing with sorocco (google for
their presentations).

Also you seem to assume things about the region balancer that are, at
least at the moment, untrue:

> Then the assumption is this process would continue until every server in the cluster has on region of data

That's more like the end result rather than the goal.

> Then during retrieval I could the use ten Threads, each would use a Start and End row with their prefix and the query should be distributed evenly out among the server.

Nothing is done to make sure that your regions will be distributed
that way, the last region for each salt key may very well end up on
the same region server. That's why it's better to use more salting.

And have you seen this?
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/

J-D

On Fri, Apr 22, 2011 at 1:18 PM, Peter Haidinyak <ph...@local.com> wrote:
> Thanks, that's the way I visualized it happening. Then the assumption is this process would continue until every server in the cluster has on region of data (more or less). My underlying question is that I need to store my data with the key starting with the date (YYYY-MM-DD). I know this means I will have hot spots during inserts but make retrieval more efficient by using a scan with start and end rows. I was thinking of adding a prefix number of 00 to 09, for the ten servers. In theory, each server should only end up with one of the prefixes. Then during retrieval I could the use ten Threads, each would use a Start and End row with their prefix and the query should be distributed evenly out among the server. I'm not sure if using ten Thread to insert the data would buy me anything or not. Anyway, I'm going to try this out at home on my own cluster to see how it performs.
>
> Thanks
>
> -Pete
>
> -----Original Message-----
> From: Buttler, David [mailto:buttler1@llnl.gov]
> Sent: Friday, April 22, 2011 12:10 PM
> To: user@hbase.apache.org
> Subject: RE: Row Key Question
>
> Regions split when they are larger than the configuration parameter region size.  Your data is small enough to fit on a single region.
>
> Keys are sorted in a region.  When a region splits the new regions are about half the size of the original region, and contain half the key space each.
>
> Dave
>
> -----Original Message-----
> From: Peter Haidinyak [mailto:phaidinyak@local.com]
> Sent: Friday, April 22, 2011 10:41 AM
> To: user@hbase.apache.org
> Subject: Row Key Question
>
> I have a question on how HBase decides to save rows based on Row Keys. Say I have a million rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte value and there are two columns per row, each column contains some random 32 byte value.
> My question is how does HBase know when to 'split' the table between the ten nodes? Or how does HBase 'split' the random keys between the ten nodes?
>
> Thanks
>
> -Pete
>

RE: Row Key Question

Posted by Peter Haidinyak <ph...@local.com>.

Thanks, that's the way I visualized it happening. Then the assumption is this process would continue until every server in the cluster has on region of data (more or less). My underlying question is that I need to store my data with the key starting with the date (YYYY-MM-DD). I know this means I will have hot spots during inserts but make retrieval more efficient by using a scan with start and end rows. I was thinking of adding a prefix number of 00 to 09, for the ten servers. In theory, each server should only end up with one of the prefixes. Then during retrieval I could the use ten Threads, each would use a Start and End row with their prefix and the query should be distributed evenly out among the server. I'm not sure if using ten Thread to insert the data would buy me anything or not. Anyway, I'm going to try this out at home on my own cluster to see how it performs.

Thanks

-Pete

-----Original Message-----
From: Buttler, David [mailto:buttler1@llnl.gov] 
Sent: Friday, April 22, 2011 12:10 PM
To: user@hbase.apache.org
Subject: RE: Row Key Question

Regions split when they are larger than the configuration parameter region size.  Your data is small enough to fit on a single region.

Keys are sorted in a region.  When a region splits the new regions are about half the size of the original region, and contain half the key space each.

Dave

-----Original Message-----
From: Peter Haidinyak [mailto:phaidinyak@local.com] 
Sent: Friday, April 22, 2011 10:41 AM
To: user@hbase.apache.org
Subject: Row Key Question

I have a question on how HBase decides to save rows based on Row Keys. Say I have a million rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte value and there are two columns per row, each column contains some random 32 byte value. 
My question is how does HBase know when to 'split' the table between the ten nodes? Or how does HBase 'split' the random keys between the ten nodes? 

Thanks

-Pete

RE: Row Key Question

Posted by "Buttler, David" <bu...@llnl.gov>.

Regions split when they are larger than the configuration parameter region size.  Your data is small enough to fit on a single region.

Keys are sorted in a region.  When a region splits the new regions are about half the size of the original region, and contain half the key space each.

Dave

-----Original Message-----
From: Peter Haidinyak [mailto:phaidinyak@local.com] 
Sent: Friday, April 22, 2011 10:41 AM
To: user@hbase.apache.org
Subject: Row Key Question

I have a question on how HBase decides to save rows based on Row Keys. Say I have a million rows to insert into a new table in a ten node cluster. Each row's key is some random 32 byte value and there are two columns per row, each column contains some random 32 byte value. 
My question is how does HBase know when to 'split' the table between the ten nodes? Or how does HBase 'split' the random keys between the ten nodes? 

Thanks

-Pete