You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Jim Kellerman (JIRA)" <ji...@apache.org> on 2007/12/03 09:12:43 UTC

[jira] Created: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

[hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
----------------------------------------------------------------------

                 Key: HADOOP-2334
                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
             Project: Hadoop
          Issue Type: Wish
          Components: contrib/hbase
    Affects Versions: 0.16.0
            Reporter: Jim Kellerman
            Assignee: Jim Kellerman
            Priority: Minor
             Fix For: 0.16.0


I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text.

What do you think?

At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. 

hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed.

If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy.

Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549265 ] 

Jim Kellerman commented on HADOOP-2334:
---------------------------------------

> class HTable<K extends WritableComparable>

++1 Ooh I like it, will have to check to see if it works all the way through though.

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
>             Project: Hadoop
>          Issue Type: Wish
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Posted by "stack (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556984#action_12556984 ] 

stack commented on HADOOP-2334:
-------------------------------

Chatting with Dave Simpson, to protect against different clients inserting rows of different types all into the one table producing an undefined sort order because of the hodge-podge of type comparators, the key type for a table should be defined as part of table creation with an illegal type exception thrown if a client tries an update with a non-matching type.

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
>             Project: Hadoop
>          Issue Type: Wish
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Posted by "Kevin Beyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549167 ] 

Kevin Beyer commented on HADOOP-2334:
-------------------------------------

Would it be difficult to allow the user to declare a WritableComparable class for the key when creating a table?   I think we should be able to get enough performance and gain considerable flexibility. The default could be Text or BytesWritable, or whatever you choose. For jaql, I would really like to use my own WritableComparable as the key.

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
>             Project: Hadoop
>          Issue Type: Wish
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Posted by "Kevin Beyer (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549246 ] 

Kevin Beyer commented on HADOOP-2334:
-------------------------------------

> Do you want to tie row keys to be a specific kind of WritableComparable, or would this work for you?

This works for me.  I was confused by the discussion on ImmutableBytesWritable.

Though I don't require it, does it make sense to follow the use of generics in map/reduce?

class HTable<K extends WritableComparable>
{
   ...
  public byte[] get(K row, Text column) throws IOException
  public HScannerInterface<K> obtainScanner(Text[] columns, K startRow)  throws IOException
}

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
>             Project: Hadoop
>          Issue Type: Wish
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12549217 ] 

Jim Kellerman commented on HADOOP-2334:
---------------------------------------

Kevin Beyer - 06/Dec/07 12:12 PM
> Would it be difficult to allow the user to declare a WritableComparable class for the key when creating a table?
> I think we should be able to get enough performance and gain considerable flexibility. The default could be
>Text or BytesWritable, or whatever you choose. For jaql, I would really like to use my own WritableComparable
> as the key.

What I was proposing as the row key was WritableComparable. Thus (for example) the following APIs:

{code}
public byte[] get(Text row, Text column) throws IOException
public byte[] get(Text row, Text column) throws IOException
public HScannerInterface obtainScanner(Text[] columns, Text startRow)  throws IOException
{code}

would become:

{code}
public byte[] get(WritableComparable row, Text column) throws IOException
public byte[] get(WritableComparable row, Text column) throws IOException
public HScannerInterface obtainScanner(Text[] columns, WritableComparable startRow)  throws IOException
{code}

Do you want to tie row keys to be a specific kind of WritableComparable, or would this work for you?


> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
>             Project: Hadoop
>          Issue Type: Wish
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Assignee: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Assigned: (HADOOP-2334) [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?

Posted by "Jim Kellerman (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HADOOP-2334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jim Kellerman reassigned HADOOP-2334:
-------------------------------------

    Assignee:     (was: Jim Kellerman)

> [hbase] VOTE: should row keys be less restrictive than hadoop.io.Text?
> ----------------------------------------------------------------------
>
>                 Key: HADOOP-2334
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2334
>             Project: Hadoop
>          Issue Type: Wish
>          Components: contrib/hbase
>    Affects Versions: 0.16.0
>            Reporter: Jim Kellerman
>            Priority: Minor
>             Fix For: 0.16.0
>
>
> I have heard from several people that row keys in HBase should be less restricted than hadoop.io.Text.
> What do you think?
> At the very least, a row key has to be a WritableComparable. This would lead to the most general case being either hadoop.io.BytesWritable or hbase.io.ImmutableBytesWritable. The primary difference between these two classes is that hadoop.io.BytesWritable by default allocates 100 bytes and if you do not pay attention to the length, (BytesWritable.getSize()), converting a String to a BytesWritable and vice versa can become problematic. 
> hbase.io.ImmutableBytesWritable, in contrast only allocates as many bytes as you pass in and then does not allow the size to be changed.
> If we were to change from Text to a non-text key, my preference would be for ImmutableBytesWritable, because it has a fixed size once set, and operations like get, etc do not have to something like System.arrayCopy where you specify the number of bytes to copy.
> Your comments, questions are welcome on this issue. If we receive enough feedback that Text is too restrictive, we are willing to change it, but we need to hear what would be the most useful thing to change it to as well.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.