You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/03/23 22:22:12 UTC

Some REST GET questions

I've got myself a little HBase install up and running on a small Hadoop 
cluster, currently running...
 	HBase Version	0.19.0, r735381
 	HBase Compiled	Sun Jan 18 14:29:34 PST 2009, stack
 	Hadoop Version	0.19.0, r713890
 	Hadoop Compiled	Fri Nov 14 03:12:29 UTC 2008, ndaley

testing stuff out with the hbase shell, things are working nicely.  I'm 
also using trying out the REST API, and I have a few questions about
how to execute certain queries.

First off, this is the table i'm testing with...

{NAME => 'userdata', IS_ROOT => 'false', IS_META => 'false',
  FAMILIES => [{NAME => 'hist', BLOOMFILTER => 'false', COMPRESSION => 
'NONE', VERSIONS => '20', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false', 
BLOCKCACHE => 'false'}, {NAME => 'user', BLOOMFILTER => 'false', 
COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647', TTL => 
'-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}], INDEXES => []}

This hypothetical example being a user activity tracking system -- the 
"keys' will be usernames, and for every action a user takes, a row will be 
inserted into the userdata table.  for some of the data i only care about 
the last action the user took, and i put that in the "user" column family 
(only 1 version) and for other pieces of data i want to keep a history of 
the last 20 actions the user took (the "hist" column family)

My first question is about clarifying what should/shoulnd't be base64 
encoded.  According to the wiki docs for hte rest interface...
   http://wiki.apache.org/hadoop/Hbase/HbaseRest
...the "value" portion of a column entry is base64 
encoded, but the "name" is not -- this matches the behavior i observe when 
POSTing data and then inspecting it using the hbase shell -- however when 
I GET results from a query using the REST interface, the names are coming 
back base64 encoded as well.  This message from a year ago seems to 
suggest that this is the expected behavior because names "can be arbitrary 
binary strings." ...
   http://markmail.org/message/dyrnxphcjp3g4ow4

...but in that case there is API descrepency between the I and the O in 
the I/O of the REST interface.  which is considered more correct? is 
there a migration plan for rectifying the discrepency?


Second Question: querying for multiple version.  I'm trying to figure out 
how i can execute the following query (from the hbase shell) via the REST 
interface...
    get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
...my naive assumption based on the other examples on the wiki are that 
something like this might work...
    http://host:60010/api/userdata/row/hossman?column=hist:vote&versions=10
...but the "versions" request param seems to be ignored.  Is this type of 
multi-version query at all supported in the REST interface?


My last question also relates to querying for multiple versions of columns 
-- the key question being "column(s)" plural.  as i mentioned before, this 
query in the base shell works fine for getting the last 10 versions of a 
specific column...
     get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
...but i can't seem to find any way to indicate that i want the last 
10 versions of *all* the columns associated with the specified key 
-- in either the REST interface or the hbase shell. I was particularly 
suprised by this error...

    get 'userdata', 'hossman', { VERSIONS => 10 }
TypeError: can't convert Hash into String
 	from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
 	from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
 	from (hbase):47:in `binding'
Maybe IRB bug!!

...and the fact that this query only produced the most recent values for 
the specified columns (even though querying for either of them 
individually with the VERSIONS=>10 option produced the full lsit for 
each)...
    get 'userdata','hossman',{COLUMNS=>['hist:vote','hist:doc'],VERSIONS=>10}
COLUMN                       CELL
  hist:doc                    timestamp=1237842101205, value=2908
  hist:vote                   timestamp=1237842101205, value=23
2 row(s) in 0.0360 seconds

Obviously anything in the "user" family only has one version (because 
that's the way the family was declared) but that's ok -- my goal is to get 
whatever data is available going back up to 10 versions.  It's not so bad 
if i have to execute two REST GETs: one for all of the current values in 
the 'user' family, and one for the last 10 versions of all the values in 
the 'hist' family; and it's not the end of the world if i have to 
explicitly list all of the column names i want in each request -- but 
making a seperate request for every column name that has multiple versions 
seems like it could get prohibitive.



Thanks in advance for any light people might be able to shed on these 
questions.


-Hoss


Re: Some REST GET questions

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
I did a php class
https://issues.apache.org/jira/browse/HBASE-37

It will give you some clues on some stuff about api if you are using php or 
can read php then it should help.

Billy


"Chris Hostetter" <ho...@fucit.org> 
wrote in message news:Pine.LNX.4.64.0903231341200.22171@radix.cryptio.net...
>
> I've got myself a little HBase install up and running on a small Hadoop 
> cluster, currently running...
>  HBase Version 0.19.0, r735381
>  HBase Compiled Sun Jan 18 14:29:34 PST 2009, stack
>  Hadoop Version 0.19.0, r713890
>  Hadoop Compiled Fri Nov 14 03:12:29 UTC 2008, ndaley
>
> testing stuff out with the hbase shell, things are working nicely.  I'm 
> also using trying out the REST API, and I have a few questions about
> how to execute certain queries.
>
> First off, this is the table i'm testing with...
>
> {NAME => 'userdata', IS_ROOT => 'false', IS_META => 'false',
>  FAMILIES => [{NAME => 'hist', BLOOMFILTER => 'false', COMPRESSION => 
> 'NONE', VERSIONS => '20', LENGTH => '2147483647', TTL => '-1', IN_MEMORY 
> => 'false', BLOCKCACHE => 'false'}, {NAME => 'user', BLOOMFILTER => 
> 'false', COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647', 
> TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}], INDEXES => []}
>
> This hypothetical example being a user activity tracking system -- the 
> "keys' will be usernames, and for every action a user takes, a row will be 
> inserted into the userdata table.  for some of the data i only care about 
> the last action the user took, and i put that in the "user" column family 
> (only 1 version) and for other pieces of data i want to keep a history of 
> the last 20 actions the user took (the "hist" column family)
>
> My first question is about clarifying what should/shoulnd't be base64 
> encoded.  According to the wiki docs for hte rest interface...
>   http://wiki.apache.org/hadoop/Hbase/HbaseRest
> ...the "value" portion of a column entry is base64 encoded, but the "name" 
> is not -- this matches the behavior i observe when POSTing data and then 
> inspecting it using the hbase shell -- however when I GET results from a 
> query using the REST interface, the names are coming back base64 encoded 
> as well.  This message from a year ago seems to suggest that this is the 
> expected behavior because names "can be arbitrary binary strings." ...
>   http://markmail.org/message/dyrnxphcjp3g4ow4
>
> ...but in that case there is API descrepency between the I and the O in 
> the I/O of the REST interface.  which is considered more correct? is there 
> a migration plan for rectifying the discrepency?
>
>
> Second Question: querying for multiple version.  I'm trying to figure out 
> how i can execute the following query (from the hbase shell) via the REST 
> interface...
>    get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...my naive assumption based on the other examples on the wiki are that 
> something like this might work...
>    http://host:60010/api/userdata/row/hossman?column=hist:vote&versions=10
> ...but the "versions" request param seems to be ignored.  Is this type of 
> multi-version query at all supported in the REST interface?
>
>
> My last question also relates to querying for multiple versions of 
> columns -- the key question being "column(s)" plural.  as i mentioned 
> before, this query in the base shell works fine for getting the last 10 
> versions of a specific column...
>     get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...but i can't seem to find any way to indicate that i want the last 10 
> versions of *all* the columns associated with the specified key -- in 
> either the REST interface or the hbase shell. I was particularly suprised 
> by this error...
>
>    get 'userdata', 'hossman', { VERSIONS => 10 }
> TypeError: can't convert Hash into String
>  from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
>  from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
>  from (hbase):47:in `binding'
> Maybe IRB bug!!
>
> ...and the fact that this query only produced the most recent values for 
> the specified columns (even though querying for either of them 
> individually with the VERSIONS=>10 option produced the full lsit for 
> each)...
>    get 
> 'userdata','hossman',{COLUMNS=>['hist:vote','hist:doc'],VERSIONS=>10}
> COLUMN                       CELL
>  hist:doc                    timestamp=1237842101205, value=2908
>  hist:vote                   timestamp=1237842101205, value=23
> 2 row(s) in 0.0360 seconds
>
> Obviously anything in the "user" family only has one version (because 
> that's the way the family was declared) but that's ok -- my goal is to get 
> whatever data is available going back up to 10 versions.  It's not so bad 
> if i have to execute two REST GETs: one for all of the current values in 
> the 'user' family, and one for the last 10 versions of all the values in 
> the 'hist' family; and it's not the end of the world if i have to 
> explicitly list all of the column names i want in each request -- but 
> making a seperate request for every column name that has multiple versions 
> seems like it could get prohibitive.
>
>
>
> Thanks in advance for any light people might be able to shed on these 
> questions.
>
>
> -Hoss
>
> 



Re: Some REST GET questions

Posted by Billy Pearson <sa...@pearsonwholesale.com>.
also note you might look in to using thrift I thank it took over a lot user 
from rest.
The support for keeping rest up todate and tested may not be there any more.

and the php class is from a long time ago 9/2008 there has been lots of 
changes in hbase sense then.

Billy


"Chris Hostetter" <ho...@fucit.org> 
wrote in message news:Pine.LNX.4.64.0903231341200.22171@radix.cryptio.net...
>
> I've got myself a little HBase install up and running on a small Hadoop 
> cluster, currently running...
>  HBase Version 0.19.0, r735381
>  HBase Compiled Sun Jan 18 14:29:34 PST 2009, stack
>  Hadoop Version 0.19.0, r713890
>  Hadoop Compiled Fri Nov 14 03:12:29 UTC 2008, ndaley
>
> testing stuff out with the hbase shell, things are working nicely.  I'm 
> also using trying out the REST API, and I have a few questions about
> how to execute certain queries.
>
> First off, this is the table i'm testing with...
>
> {NAME => 'userdata', IS_ROOT => 'false', IS_META => 'false',
>  FAMILIES => [{NAME => 'hist', BLOOMFILTER => 'false', COMPRESSION => 
> 'NONE', VERSIONS => '20', LENGTH => '2147483647', TTL => '-1', IN_MEMORY 
> => 'false', BLOCKCACHE => 'false'}, {NAME => 'user', BLOOMFILTER => 
> 'false', COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647', 
> TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}], INDEXES => []}
>
> This hypothetical example being a user activity tracking system -- the 
> "keys' will be usernames, and for every action a user takes, a row will be 
> inserted into the userdata table.  for some of the data i only care about 
> the last action the user took, and i put that in the "user" column family 
> (only 1 version) and for other pieces of data i want to keep a history of 
> the last 20 actions the user took (the "hist" column family)
>
> My first question is about clarifying what should/shoulnd't be base64 
> encoded.  According to the wiki docs for hte rest interface...
>   http://wiki.apache.org/hadoop/Hbase/HbaseRest
> ...the "value" portion of a column entry is base64 encoded, but the "name" 
> is not -- this matches the behavior i observe when POSTing data and then 
> inspecting it using the hbase shell -- however when I GET results from a 
> query using the REST interface, the names are coming back base64 encoded 
> as well.  This message from a year ago seems to suggest that this is the 
> expected behavior because names "can be arbitrary binary strings." ...
>   http://markmail.org/message/dyrnxphcjp3g4ow4
>
> ...but in that case there is API descrepency between the I and the O in 
> the I/O of the REST interface.  which is considered more correct? is there 
> a migration plan for rectifying the discrepency?
>
>
> Second Question: querying for multiple version.  I'm trying to figure out 
> how i can execute the following query (from the hbase shell) via the REST 
> interface...
>    get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...my naive assumption based on the other examples on the wiki are that 
> something like this might work...
>    http://host:60010/api/userdata/row/hossman?column=hist:vote&versions=10
> ...but the "versions" request param seems to be ignored.  Is this type of 
> multi-version query at all supported in the REST interface?
>
>
> My last question also relates to querying for multiple versions of 
> columns -- the key question being "column(s)" plural.  as i mentioned 
> before, this query in the base shell works fine for getting the last 10 
> versions of a specific column...
>     get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...but i can't seem to find any way to indicate that i want the last 10 
> versions of *all* the columns associated with the specified key -- in 
> either the REST interface or the hbase shell. I was particularly suprised 
> by this error...
>
>    get 'userdata', 'hossman', { VERSIONS => 10 }
> TypeError: can't convert Hash into String
>  from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
>  from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
>  from (hbase):47:in `binding'
> Maybe IRB bug!!
>
> ...and the fact that this query only produced the most recent values for 
> the specified columns (even though querying for either of them 
> individually with the VERSIONS=>10 option produced the full lsit for 
> each)...
>    get 
> 'userdata','hossman',{COLUMNS=>['hist:vote','hist:doc'],VERSIONS=>10}
> COLUMN                       CELL
>  hist:doc                    timestamp=1237842101205, value=2908
>  hist:vote                   timestamp=1237842101205, value=23
> 2 row(s) in 0.0360 seconds
>
> Obviously anything in the "user" family only has one version (because 
> that's the way the family was declared) but that's ok -- my goal is to get 
> whatever data is available going back up to 10 versions.  It's not so bad 
> if i have to execute two REST GETs: one for all of the current values in 
> the 'user' family, and one for the last 10 versions of all the values in 
> the 'hist' family; and it's not the end of the world if i have to 
> explicitly list all of the column names i want in each request -- but 
> making a seperate request for every column name that has multiple versions 
> seems like it could get prohibitive.
>
>
>
> Thanks in advance for any light people might be able to shed on these 
> questions.
>
>
> -Hoss
>
>