You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Chris Hostetter <ho...@fucit.org> on 2009/03/23 22:22:12 UTC
Some REST GET questions
I've got myself a little HBase install up and running on a small Hadoop
cluster, currently running...
HBase Version 0.19.0, r735381
HBase Compiled Sun Jan 18 14:29:34 PST 2009, stack
Hadoop Version 0.19.0, r713890
Hadoop Compiled Fri Nov 14 03:12:29 UTC 2008, ndaley
testing stuff out with the hbase shell, things are working nicely. I'm
also using trying out the REST API, and I have a few questions about
how to execute certain queries.
First off, this is the table i'm testing with...
{NAME => 'userdata', IS_ROOT => 'false', IS_META => 'false',
FAMILIES => [{NAME => 'hist', BLOOMFILTER => 'false', COMPRESSION =>
'NONE', VERSIONS => '20', LENGTH => '2147483647', TTL => '-1', IN_MEMORY => 'false',
BLOCKCACHE => 'false'}, {NAME => 'user', BLOOMFILTER => 'false',
COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647', TTL =>
'-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}], INDEXES => []}
This hypothetical example being a user activity tracking system -- the
"keys' will be usernames, and for every action a user takes, a row will be
inserted into the userdata table. for some of the data i only care about
the last action the user took, and i put that in the "user" column family
(only 1 version) and for other pieces of data i want to keep a history of
the last 20 actions the user took (the "hist" column family)
My first question is about clarifying what should/shoulnd't be base64
encoded. According to the wiki docs for hte rest interface...
http://wiki.apache.org/hadoop/Hbase/HbaseRest
...the "value" portion of a column entry is base64
encoded, but the "name" is not -- this matches the behavior i observe when
POSTing data and then inspecting it using the hbase shell -- however when
I GET results from a query using the REST interface, the names are coming
back base64 encoded as well. This message from a year ago seems to
suggest that this is the expected behavior because names "can be arbitrary
binary strings." ...
http://markmail.org/message/dyrnxphcjp3g4ow4
...but in that case there is API descrepency between the I and the O in
the I/O of the REST interface. which is considered more correct? is
there a migration plan for rectifying the discrepency?
Second Question: querying for multiple version. I'm trying to figure out
how i can execute the following query (from the hbase shell) via the REST
interface...
get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
...my naive assumption based on the other examples on the wiki are that
something like this might work...
http://host:60010/api/userdata/row/hossman?column=hist:vote&versions=10
...but the "versions" request param seems to be ignored. Is this type of
multi-version query at all supported in the REST interface?
My last question also relates to querying for multiple versions of columns
-- the key question being "column(s)" plural. as i mentioned before, this
query in the base shell works fine for getting the last 10 versions of a
specific column...
get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
...but i can't seem to find any way to indicate that i want the last
10 versions of *all* the columns associated with the specified key
-- in either the REST interface or the hbase shell. I was particularly
suprised by this error...
get 'userdata', 'hossman', { VERSIONS => 10 }
TypeError: can't convert Hash into String
from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
from (hbase):47:in `binding'
Maybe IRB bug!!
...and the fact that this query only produced the most recent values for
the specified columns (even though querying for either of them
individually with the VERSIONS=>10 option produced the full lsit for
each)...
get 'userdata','hossman',{COLUMNS=>['hist:vote','hist:doc'],VERSIONS=>10}
COLUMN CELL
hist:doc timestamp=1237842101205, value=2908
hist:vote timestamp=1237842101205, value=23
2 row(s) in 0.0360 seconds
Obviously anything in the "user" family only has one version (because
that's the way the family was declared) but that's ok -- my goal is to get
whatever data is available going back up to 10 versions. It's not so bad
if i have to execute two REST GETs: one for all of the current values in
the 'user' family, and one for the last 10 versions of all the values in
the 'hist' family; and it's not the end of the world if i have to
explicitly list all of the column names i want in each request -- but
making a seperate request for every column name that has multiple versions
seems like it could get prohibitive.
Thanks in advance for any light people might be able to shed on these
questions.
-Hoss
Re: Some REST GET questions
Posted by Billy Pearson <sa...@pearsonwholesale.com>.
I did a php class
https://issues.apache.org/jira/browse/HBASE-37
It will give you some clues on some stuff about api if you are using php or
can read php then it should help.
Billy
"Chris Hostetter" <ho...@fucit.org>
wrote in message news:Pine.LNX.4.64.0903231341200.22171@radix.cryptio.net...
>
> I've got myself a little HBase install up and running on a small Hadoop
> cluster, currently running...
> HBase Version 0.19.0, r735381
> HBase Compiled Sun Jan 18 14:29:34 PST 2009, stack
> Hadoop Version 0.19.0, r713890
> Hadoop Compiled Fri Nov 14 03:12:29 UTC 2008, ndaley
>
> testing stuff out with the hbase shell, things are working nicely. I'm
> also using trying out the REST API, and I have a few questions about
> how to execute certain queries.
>
> First off, this is the table i'm testing with...
>
> {NAME => 'userdata', IS_ROOT => 'false', IS_META => 'false',
> FAMILIES => [{NAME => 'hist', BLOOMFILTER => 'false', COMPRESSION =>
> 'NONE', VERSIONS => '20', LENGTH => '2147483647', TTL => '-1', IN_MEMORY
> => 'false', BLOCKCACHE => 'false'}, {NAME => 'user', BLOOMFILTER =>
> 'false', COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647',
> TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}], INDEXES => []}
>
> This hypothetical example being a user activity tracking system -- the
> "keys' will be usernames, and for every action a user takes, a row will be
> inserted into the userdata table. for some of the data i only care about
> the last action the user took, and i put that in the "user" column family
> (only 1 version) and for other pieces of data i want to keep a history of
> the last 20 actions the user took (the "hist" column family)
>
> My first question is about clarifying what should/shoulnd't be base64
> encoded. According to the wiki docs for hte rest interface...
> http://wiki.apache.org/hadoop/Hbase/HbaseRest
> ...the "value" portion of a column entry is base64 encoded, but the "name"
> is not -- this matches the behavior i observe when POSTing data and then
> inspecting it using the hbase shell -- however when I GET results from a
> query using the REST interface, the names are coming back base64 encoded
> as well. This message from a year ago seems to suggest that this is the
> expected behavior because names "can be arbitrary binary strings." ...
> http://markmail.org/message/dyrnxphcjp3g4ow4
>
> ...but in that case there is API descrepency between the I and the O in
> the I/O of the REST interface. which is considered more correct? is there
> a migration plan for rectifying the discrepency?
>
>
> Second Question: querying for multiple version. I'm trying to figure out
> how i can execute the following query (from the hbase shell) via the REST
> interface...
> get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...my naive assumption based on the other examples on the wiki are that
> something like this might work...
> http://host:60010/api/userdata/row/hossman?column=hist:vote&versions=10
> ...but the "versions" request param seems to be ignored. Is this type of
> multi-version query at all supported in the REST interface?
>
>
> My last question also relates to querying for multiple versions of
> columns -- the key question being "column(s)" plural. as i mentioned
> before, this query in the base shell works fine for getting the last 10
> versions of a specific column...
> get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...but i can't seem to find any way to indicate that i want the last 10
> versions of *all* the columns associated with the specified key -- in
> either the REST interface or the hbase shell. I was particularly suprised
> by this error...
>
> get 'userdata', 'hossman', { VERSIONS => 10 }
> TypeError: can't convert Hash into String
> from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
> from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
> from (hbase):47:in `binding'
> Maybe IRB bug!!
>
> ...and the fact that this query only produced the most recent values for
> the specified columns (even though querying for either of them
> individually with the VERSIONS=>10 option produced the full lsit for
> each)...
> get
> 'userdata','hossman',{COLUMNS=>['hist:vote','hist:doc'],VERSIONS=>10}
> COLUMN CELL
> hist:doc timestamp=1237842101205, value=2908
> hist:vote timestamp=1237842101205, value=23
> 2 row(s) in 0.0360 seconds
>
> Obviously anything in the "user" family only has one version (because
> that's the way the family was declared) but that's ok -- my goal is to get
> whatever data is available going back up to 10 versions. It's not so bad
> if i have to execute two REST GETs: one for all of the current values in
> the 'user' family, and one for the last 10 versions of all the values in
> the 'hist' family; and it's not the end of the world if i have to
> explicitly list all of the column names i want in each request -- but
> making a seperate request for every column name that has multiple versions
> seems like it could get prohibitive.
>
>
>
> Thanks in advance for any light people might be able to shed on these
> questions.
>
>
> -Hoss
>
>
Re: Some REST GET questions
Posted by Billy Pearson <sa...@pearsonwholesale.com>.
also note you might look in to using thrift I thank it took over a lot user
from rest.
The support for keeping rest up todate and tested may not be there any more.
and the php class is from a long time ago 9/2008 there has been lots of
changes in hbase sense then.
Billy
"Chris Hostetter" <ho...@fucit.org>
wrote in message news:Pine.LNX.4.64.0903231341200.22171@radix.cryptio.net...
>
> I've got myself a little HBase install up and running on a small Hadoop
> cluster, currently running...
> HBase Version 0.19.0, r735381
> HBase Compiled Sun Jan 18 14:29:34 PST 2009, stack
> Hadoop Version 0.19.0, r713890
> Hadoop Compiled Fri Nov 14 03:12:29 UTC 2008, ndaley
>
> testing stuff out with the hbase shell, things are working nicely. I'm
> also using trying out the REST API, and I have a few questions about
> how to execute certain queries.
>
> First off, this is the table i'm testing with...
>
> {NAME => 'userdata', IS_ROOT => 'false', IS_META => 'false',
> FAMILIES => [{NAME => 'hist', BLOOMFILTER => 'false', COMPRESSION =>
> 'NONE', VERSIONS => '20', LENGTH => '2147483647', TTL => '-1', IN_MEMORY
> => 'false', BLOCKCACHE => 'false'}, {NAME => 'user', BLOOMFILTER =>
> 'false', COMPRESSION => 'NONE', VERSIONS => '1', LENGTH => '2147483647',
> TTL => '-1', IN_MEMORY => 'false', BLOCKCACHE => 'false'}], INDEXES => []}
>
> This hypothetical example being a user activity tracking system -- the
> "keys' will be usernames, and for every action a user takes, a row will be
> inserted into the userdata table. for some of the data i only care about
> the last action the user took, and i put that in the "user" column family
> (only 1 version) and for other pieces of data i want to keep a history of
> the last 20 actions the user took (the "hist" column family)
>
> My first question is about clarifying what should/shoulnd't be base64
> encoded. According to the wiki docs for hte rest interface...
> http://wiki.apache.org/hadoop/Hbase/HbaseRest
> ...the "value" portion of a column entry is base64 encoded, but the "name"
> is not -- this matches the behavior i observe when POSTing data and then
> inspecting it using the hbase shell -- however when I GET results from a
> query using the REST interface, the names are coming back base64 encoded
> as well. This message from a year ago seems to suggest that this is the
> expected behavior because names "can be arbitrary binary strings." ...
> http://markmail.org/message/dyrnxphcjp3g4ow4
>
> ...but in that case there is API descrepency between the I and the O in
> the I/O of the REST interface. which is considered more correct? is there
> a migration plan for rectifying the discrepency?
>
>
> Second Question: querying for multiple version. I'm trying to figure out
> how i can execute the following query (from the hbase shell) via the REST
> interface...
> get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...my naive assumption based on the other examples on the wiki are that
> something like this might work...
> http://host:60010/api/userdata/row/hossman?column=hist:vote&versions=10
> ...but the "versions" request param seems to be ignored. Is this type of
> multi-version query at all supported in the REST interface?
>
>
> My last question also relates to querying for multiple versions of
> columns -- the key question being "column(s)" plural. as i mentioned
> before, this query in the base shell works fine for getting the last 10
> versions of a specific column...
> get 'userdata', 'hossman', {COLUMN => 'hist:vote', VERSIONS => 10}
> ...but i can't seem to find any way to indicate that i want the last 10
> versions of *all* the columns associated with the specified key -- in
> either the REST interface or the hbase shell. I was particularly suprised
> by this error...
>
> get 'userdata', 'hossman', { VERSIONS => 10 }
> TypeError: can't convert Hash into String
> from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
> from /var/opt/chrish-hadoop/hbase-0.19.0/bin/../bin/hirb.rb:326:in `get'
> from (hbase):47:in `binding'
> Maybe IRB bug!!
>
> ...and the fact that this query only produced the most recent values for
> the specified columns (even though querying for either of them
> individually with the VERSIONS=>10 option produced the full lsit for
> each)...
> get
> 'userdata','hossman',{COLUMNS=>['hist:vote','hist:doc'],VERSIONS=>10}
> COLUMN CELL
> hist:doc timestamp=1237842101205, value=2908
> hist:vote timestamp=1237842101205, value=23
> 2 row(s) in 0.0360 seconds
>
> Obviously anything in the "user" family only has one version (because
> that's the way the family was declared) but that's ok -- my goal is to get
> whatever data is available going back up to 10 versions. It's not so bad
> if i have to execute two REST GETs: one for all of the current values in
> the 'user' family, and one for the last 10 versions of all the values in
> the 'hist' family; and it's not the end of the world if i have to
> explicitly list all of the column names i want in each request -- but
> making a seperate request for every column name that has multiple versions
> seems like it could get prohibitive.
>
>
>
> Thanks in advance for any light people might be able to shed on these
> questions.
>
>
> -Hoss
>
>