You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by stack <st...@duboce.net> on 2009/03/01 00:05:44 UTC

Re: strange region name, is it right?

This sounds like an interesting exercise.   We should do same on this end
proving a release on a cluster just before we put it out.
Are the keys that TeraGen makes binary?  Maybe check its source?

If they are, they'll look odd in the UI and on shell; we don't support them
in UI and shell (yet) but hbase should operate fine with binary keys.  Is it
not working for you?

St.Ack


On Sat, Feb 28, 2009 at 2:56 AM, schubert zhang <zs...@gmail.com> wrote:

> I have being used HBase and Hadoop for 5 months.
>
> My testbed have 5node(1mastar and 4slaves)
> Hadoop-0.19.1
> HBase-0.19.0
>
> 1. I use the TeraGen mapreduce job of hadoop examples, to generate files
> with random key-value paires.
>    I just create a 1G data and  another 10G data for later test.
>
> 2. Then write a job to read these TeraGen files and insert each record's
> key-value to a HBase table.
>    (create 'sort1g', {NAME => 't', VERSIONS => 1}
>     (create 'sort10g', {NAME => 't', VERSIONS => 1}
>    I want use this insert jobs to simulate the TeraSort, since HBase
> automatically sort rows.
>
> 3. after finish the insert jobs. On the web interface of HBase, I found
> following strange thing:
>
> Name Region Server Encoded Name Start Key End Key
> ......
> sort10g,%ql`{^8Bcf,1235730412828   nd2-rack0-cloud:60020   155375382
>  %ql`{^8Bcf   &YK&Uop0a=
> sort10g,&YK&Uop0a=,1235730749832  nd1-rack0-cloud:60020  1574155935
>  &YK&Uop0a=  'B'Zp+!]Tb
> sort10g,'B'Zp+!]Tb,1235730749832  nd1-rack0-cloud:60020  395792177
>  'B'Zp+!]Tb  ()o:
> sort10g,()o:  nd1-rack0-cloud:60020  1176340729  ()o:  (qYp"7;j2$
> sort10g,(qYp"7;j2$,1235730731006  nd1-rack0-cloud:60020  2143364419
>  (qYp"7;j2$  )Z/?>:ZM3Z
> sort10g,)Z/?>:ZM3Z,1235730853698  nd2-rack0-cloud:60020  440987412
>  )Z/?>:ZM3Z  *BuVHF#1ME
> .......
> sort10g,:Qt-(8;Y>i,1235730441379   nd1-rack0-cloud:60020   1461025497
>  :Qt-(8;Y>i   ;;Vg!IT[d"
> sort10g,;;Vg!IT[d",1235730461102  nd1-rack0-cloud:60020  36776992
>  ;;Vg!IT[d"  <$#
> sort10g,<$#  nd1-rack0-cloud:60020  1430043392  <$#
> sort10g,  nd3-rack0-cloud:60020  1176532237   =VyK?xTtI`
> sort10g,=VyK?xTtI`,1235730334262  nd3-rack0-cloud:60020  1165072084
>  =VyK?xTtI`  >A274Dj=vU
>  .......
> sort10g,s#Y}pGP|{3,1235730476424   nd1-rack0-cloud:60020   1728348677
>  s#Y}pGP|{3   soWA+0=0Ao
> sort10g,soWA+0=0Ao,1235730487163  nd1-rack0-cloud:60020  1275380223
>  soWA+0=0Ao  t\<
> sort10g,t\<  nd1-rack0-cloud:60020  2080592534  t\<  uI-1OW2g=t
> sort10g,uI-1OW2g=t,1235730515195  nd1-rack0-cloud:60020  232566103
>  uI-1OW2g=t  v6'-_5E]7'
>
>
> In above lines, some look not like normal:
> sort10g,()o:  nd1-rack0-cloud:60020  1176340729  ()o:  (qYp"7;j2$
> sort10g,<$#  nd1-rack0-cloud:60020  1430043392  <$#
> sort10g,  nd3-rack0-cloud:60020  1176532237   =VyK?xTtI`
> sort10g,t\<  nd1-rack0-cloud:60020  2080592534  t\<  uI-1OW2g=t
>
>
> Coud you please tell me it is right or not.
>

Re: strange region name, is it right?

Posted by schubert zhang <zs...@gmail.com>.
Hi Stack,

I have sent the TeraDataGen and TeraDataSort code to you in another email to
you duboce.net address. Please check for reference.

1. The keys of TeraDataGen are not binary, they are displayable characters
from ASCII code ' '(space) to '~'.
The format if each row is: (10 bytes key) (10 bytes rowid) (78 bytes filler)
\r\n
The keys are random characters from the set ' ' .. '~'.
The rowid is the right justified row id as a int.
The filler consists of 7 runs of 10 characters from 'A' to 'Z'.

I define a simplest HBase table to store the sorted data: create 't1', {NAME
=> 't', VERSIONS => 1}, the only column is t:v.
RowKey = (10 bytes key)
Column t:v 's value = (10 bytes rowid)(78 bytes filler)\r\n

2. I have done more test, and find:
Because some rowKey have character '<' or/and '>', the web UI cannot rightly
display. But the rowkey is right we I get it by HBase API. May the Web UI
code should be modified.

3. Another question:
I found the format of Region Name in the Web UI is dismembered by comma.
Can I have comma character in the rowkey string?

Regards,
Schubert

On Sun, Mar 1, 2009 at 7:05 AM, stack <st...@duboce.net> wrote:

> This sounds like an interesting exercise.   We should do same on this end
> proving a release on a cluster just before we put it out.
> Are the keys that TeraGen makes binary?  Maybe check its source?
>
> If they are, they'll look odd in the UI and on shell; we don't support them
> in UI and shell (yet) but hbase should operate fine with binary keys.  Is
> it
> not working for you?
>
> St.Ack
>
>
> On Sat, Feb 28, 2009 at 2:56 AM, schubert zhang <zs...@gmail.com> wrote:
>
> > I have being used HBase and Hadoop for 5 months.
> >
> > My testbed have 5node(1mastar and 4slaves)
> > Hadoop-0.19.1
> > HBase-0.19.0
> >
> > 1. I use the TeraGen mapreduce job of hadoop examples, to generate files
> > with random key-value paires.
> >    I just create a 1G data and  another 10G data for later test.
> >
> > 2. Then write a job to read these TeraGen files and insert each record's
> > key-value to a HBase table.
> >    (create 'sort1g', {NAME => 't', VERSIONS => 1}
> >     (create 'sort10g', {NAME => 't', VERSIONS => 1}
> >    I want use this insert jobs to simulate the TeraSort, since HBase
> > automatically sort rows.
> >
> > 3. after finish the insert jobs. On the web interface of HBase, I found
> > following strange thing:
> >
> > Name Region Server Encoded Name Start Key End Key
> > ......
> > sort10g,%ql`{^8Bcf,1235730412828   nd2-rack0-cloud:60020   155375382
> >  %ql`{^8Bcf   &YK&Uop0a=
> > sort10g,&YK&Uop0a=,1235730749832  nd1-rack0-cloud:60020  1574155935
> >  &YK&Uop0a=  'B'Zp+!]Tb
> > sort10g,'B'Zp+!]Tb,1235730749832  nd1-rack0-cloud:60020  395792177
> >  'B'Zp+!]Tb  ()o:
> > sort10g,()o:  nd1-rack0-cloud:60020  1176340729  ()o:  (qYp"7;j2$
> > sort10g,(qYp"7;j2$,1235730731006  nd1-rack0-cloud:60020  2143364419
> >  (qYp"7;j2$  )Z/?>:ZM3Z
> > sort10g,)Z/?>:ZM3Z,1235730853698  nd2-rack0-cloud:60020  440987412
> >  )Z/?>:ZM3Z  *BuVHF#1ME
> > .......
> > sort10g,:Qt-(8;Y>i,1235730441379   nd1-rack0-cloud:60020   1461025497
> >  :Qt-(8;Y>i   ;;Vg!IT[d"
> > sort10g,;;Vg!IT[d",1235730461102  nd1-rack0-cloud:60020  36776992
> >  ;;Vg!IT[d"  <$#
> > sort10g,<$#  nd1-rack0-cloud:60020  1430043392  <$#
> > sort10g,  nd3-rack0-cloud:60020  1176532237   =VyK?xTtI`
> > sort10g,=VyK?xTtI`,1235730334262  nd3-rack0-cloud:60020  1165072084
> >  =VyK?xTtI`  >A274Dj=vU
> >  .......
> > sort10g,s#Y}pGP|{3,1235730476424   nd1-rack0-cloud:60020   1728348677
> >  s#Y}pGP|{3   soWA+0=0Ao
> > sort10g,soWA+0=0Ao,1235730487163  nd1-rack0-cloud:60020  1275380223
> >  soWA+0=0Ao  t\<
> > sort10g,t\<  nd1-rack0-cloud:60020  2080592534  t\<  uI-1OW2g=t
> > sort10g,uI-1OW2g=t,1235730515195  nd1-rack0-cloud:60020  232566103
> >  uI-1OW2g=t  v6'-_5E]7'
> >
> >
> > In above lines, some look not like normal:
> > sort10g,()o:  nd1-rack0-cloud:60020  1176340729  ()o:  (qYp"7;j2$
> > sort10g,<$#  nd1-rack0-cloud:60020  1430043392  <$#
> > sort10g,  nd3-rack0-cloud:60020  1176532237   =VyK?xTtI`
> > sort10g,t\<  nd1-rack0-cloud:60020  2080592534  t\<  uI-1OW2g=t
> >
> >
> > Coud you please tell me it is right or not.
> >
>