You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by stack <st...@duboce.net> on 2009/03/01 00:05:44 UTC
Re: strange region name, is it right?
This sounds like an interesting exercise. We should do same on this end
proving a release on a cluster just before we put it out.
Are the keys that TeraGen makes binary? Maybe check its source?
If they are, they'll look odd in the UI and on shell; we don't support them
in UI and shell (yet) but hbase should operate fine with binary keys. Is it
not working for you?
St.Ack
On Sat, Feb 28, 2009 at 2:56 AM, schubert zhang <zs...@gmail.com> wrote:
> I have being used HBase and Hadoop for 5 months.
>
> My testbed have 5node(1mastar and 4slaves)
> Hadoop-0.19.1
> HBase-0.19.0
>
> 1. I use the TeraGen mapreduce job of hadoop examples, to generate files
> with random key-value paires.
> I just create a 1G data and another 10G data for later test.
>
> 2. Then write a job to read these TeraGen files and insert each record's
> key-value to a HBase table.
> (create 'sort1g', {NAME => 't', VERSIONS => 1}
> (create 'sort10g', {NAME => 't', VERSIONS => 1}
> I want use this insert jobs to simulate the TeraSort, since HBase
> automatically sort rows.
>
> 3. after finish the insert jobs. On the web interface of HBase, I found
> following strange thing:
>
> Name Region Server Encoded Name Start Key End Key
> ......
> sort10g,%ql`{^8Bcf,1235730412828 nd2-rack0-cloud:60020 155375382
> %ql`{^8Bcf &YK&Uop0a=
> sort10g,&YK&Uop0a=,1235730749832 nd1-rack0-cloud:60020 1574155935
> &YK&Uop0a= 'B'Zp+!]Tb
> sort10g,'B'Zp+!]Tb,1235730749832 nd1-rack0-cloud:60020 395792177
> 'B'Zp+!]Tb ()o:
> sort10g,()o: nd1-rack0-cloud:60020 1176340729 ()o: (qYp"7;j2$
> sort10g,(qYp"7;j2$,1235730731006 nd1-rack0-cloud:60020 2143364419
> (qYp"7;j2$ )Z/?>:ZM3Z
> sort10g,)Z/?>:ZM3Z,1235730853698 nd2-rack0-cloud:60020 440987412
> )Z/?>:ZM3Z *BuVHF#1ME
> .......
> sort10g,:Qt-(8;Y>i,1235730441379 nd1-rack0-cloud:60020 1461025497
> :Qt-(8;Y>i ;;Vg!IT[d"
> sort10g,;;Vg!IT[d",1235730461102 nd1-rack0-cloud:60020 36776992
> ;;Vg!IT[d" <$#
> sort10g,<$# nd1-rack0-cloud:60020 1430043392 <$#
> sort10g, nd3-rack0-cloud:60020 1176532237 =VyK?xTtI`
> sort10g,=VyK?xTtI`,1235730334262 nd3-rack0-cloud:60020 1165072084
> =VyK?xTtI` >A274Dj=vU
> .......
> sort10g,s#Y}pGP|{3,1235730476424 nd1-rack0-cloud:60020 1728348677
> s#Y}pGP|{3 soWA+0=0Ao
> sort10g,soWA+0=0Ao,1235730487163 nd1-rack0-cloud:60020 1275380223
> soWA+0=0Ao t\<
> sort10g,t\< nd1-rack0-cloud:60020 2080592534 t\< uI-1OW2g=t
> sort10g,uI-1OW2g=t,1235730515195 nd1-rack0-cloud:60020 232566103
> uI-1OW2g=t v6'-_5E]7'
>
>
> In above lines, some look not like normal:
> sort10g,()o: nd1-rack0-cloud:60020 1176340729 ()o: (qYp"7;j2$
> sort10g,<$# nd1-rack0-cloud:60020 1430043392 <$#
> sort10g, nd3-rack0-cloud:60020 1176532237 =VyK?xTtI`
> sort10g,t\< nd1-rack0-cloud:60020 2080592534 t\< uI-1OW2g=t
>
>
> Coud you please tell me it is right or not.
>
Re: strange region name, is it right?
Posted by schubert zhang <zs...@gmail.com>.
Hi Stack,
I have sent the TeraDataGen and TeraDataSort code to you in another email to
you duboce.net address. Please check for reference.
1. The keys of TeraDataGen are not binary, they are displayable characters
from ASCII code ' '(space) to '~'.
The format if each row is: (10 bytes key) (10 bytes rowid) (78 bytes filler)
\r\n
The keys are random characters from the set ' ' .. '~'.
The rowid is the right justified row id as a int.
The filler consists of 7 runs of 10 characters from 'A' to 'Z'.
I define a simplest HBase table to store the sorted data: create 't1', {NAME
=> 't', VERSIONS => 1}, the only column is t:v.
RowKey = (10 bytes key)
Column t:v 's value = (10 bytes rowid)(78 bytes filler)\r\n
2. I have done more test, and find:
Because some rowKey have character '<' or/and '>', the web UI cannot rightly
display. But the rowkey is right we I get it by HBase API. May the Web UI
code should be modified.
3. Another question:
I found the format of Region Name in the Web UI is dismembered by comma.
Can I have comma character in the rowkey string?
Regards,
Schubert
On Sun, Mar 1, 2009 at 7:05 AM, stack <st...@duboce.net> wrote:
> This sounds like an interesting exercise. We should do same on this end
> proving a release on a cluster just before we put it out.
> Are the keys that TeraGen makes binary? Maybe check its source?
>
> If they are, they'll look odd in the UI and on shell; we don't support them
> in UI and shell (yet) but hbase should operate fine with binary keys. Is
> it
> not working for you?
>
> St.Ack
>
>
> On Sat, Feb 28, 2009 at 2:56 AM, schubert zhang <zs...@gmail.com> wrote:
>
> > I have being used HBase and Hadoop for 5 months.
> >
> > My testbed have 5node(1mastar and 4slaves)
> > Hadoop-0.19.1
> > HBase-0.19.0
> >
> > 1. I use the TeraGen mapreduce job of hadoop examples, to generate files
> > with random key-value paires.
> > I just create a 1G data and another 10G data for later test.
> >
> > 2. Then write a job to read these TeraGen files and insert each record's
> > key-value to a HBase table.
> > (create 'sort1g', {NAME => 't', VERSIONS => 1}
> > (create 'sort10g', {NAME => 't', VERSIONS => 1}
> > I want use this insert jobs to simulate the TeraSort, since HBase
> > automatically sort rows.
> >
> > 3. after finish the insert jobs. On the web interface of HBase, I found
> > following strange thing:
> >
> > Name Region Server Encoded Name Start Key End Key
> > ......
> > sort10g,%ql`{^8Bcf,1235730412828 nd2-rack0-cloud:60020 155375382
> > %ql`{^8Bcf &YK&Uop0a=
> > sort10g,&YK&Uop0a=,1235730749832 nd1-rack0-cloud:60020 1574155935
> > &YK&Uop0a= 'B'Zp+!]Tb
> > sort10g,'B'Zp+!]Tb,1235730749832 nd1-rack0-cloud:60020 395792177
> > 'B'Zp+!]Tb ()o:
> > sort10g,()o: nd1-rack0-cloud:60020 1176340729 ()o: (qYp"7;j2$
> > sort10g,(qYp"7;j2$,1235730731006 nd1-rack0-cloud:60020 2143364419
> > (qYp"7;j2$ )Z/?>:ZM3Z
> > sort10g,)Z/?>:ZM3Z,1235730853698 nd2-rack0-cloud:60020 440987412
> > )Z/?>:ZM3Z *BuVHF#1ME
> > .......
> > sort10g,:Qt-(8;Y>i,1235730441379 nd1-rack0-cloud:60020 1461025497
> > :Qt-(8;Y>i ;;Vg!IT[d"
> > sort10g,;;Vg!IT[d",1235730461102 nd1-rack0-cloud:60020 36776992
> > ;;Vg!IT[d" <$#
> > sort10g,<$# nd1-rack0-cloud:60020 1430043392 <$#
> > sort10g, nd3-rack0-cloud:60020 1176532237 =VyK?xTtI`
> > sort10g,=VyK?xTtI`,1235730334262 nd3-rack0-cloud:60020 1165072084
> > =VyK?xTtI` >A274Dj=vU
> > .......
> > sort10g,s#Y}pGP|{3,1235730476424 nd1-rack0-cloud:60020 1728348677
> > s#Y}pGP|{3 soWA+0=0Ao
> > sort10g,soWA+0=0Ao,1235730487163 nd1-rack0-cloud:60020 1275380223
> > soWA+0=0Ao t\<
> > sort10g,t\< nd1-rack0-cloud:60020 2080592534 t\< uI-1OW2g=t
> > sort10g,uI-1OW2g=t,1235730515195 nd1-rack0-cloud:60020 232566103
> > uI-1OW2g=t v6'-_5E]7'
> >
> >
> > In above lines, some look not like normal:
> > sort10g,()o: nd1-rack0-cloud:60020 1176340729 ()o: (qYp"7;j2$
> > sort10g,<$# nd1-rack0-cloud:60020 1430043392 <$#
> > sort10g, nd3-rack0-cloud:60020 1176532237 =VyK?xTtI`
> > sort10g,t\< nd1-rack0-cloud:60020 2080592534 t\< uI-1OW2g=t
> >
> >
> > Coud you please tell me it is right or not.
> >
>