You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Bing Li <lb...@gmail.com> on 2012/08/29 16:04:54 UTC

HBase Is So Slow To Save Data?

Dear all,

According to my experiences, it is very slow for HBase to save data? Am I
right?

For example, today I need to save data in a HashMap to HBase. It took about
more than three hours. However when saving the same HashMap in a file in
the text format with the redirected System.out, it took only 4.5 seconds!

Why is HBase so slow? It is indexing?

My code to save data in HBase is as follows. I think the code must be
correct.

        ......
        public synchronized void
AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int timingScale)
        {
                List<Put> puts = new ArrayList<Put>();

                String hhNeighborRowKey;
                Put hubKeyPut;
                Put groupKeyPut;
                Put topGroupKeyPut;
                Put timingScalePut;
                Put nodeKeyPut;
                Put hubNeighborTypePut;

                for (Map.Entry<String, ConcurrentHashMap<String,
Set<String>>> sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
                {
                        for (Map.Entry<String, Set<String>>
groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
                        {
                                for (String neighborKey :
groupNeighborEntry.getValue())
                                {
                                        hhNeighborRowKey =
NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
groupNeighborEntry.getKey() + timingScale + neighborKey);

                                        hubKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
                                        puts.add(hubKeyPut);

                                        groupKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
Bytes.toBytes(groupNeighborEntry.getKey()));
                                        puts.add(groupKeyPut);

                                        topGroupKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
                                        puts.add(topGroupKeyPut);

                                        timingScalePut = new
Put(Bytes.toBytes(hhNeighborRowKey));

timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
Bytes.toBytes(timingScale));
                                        puts.add(timingScalePut);

                                        nodeKeyPut = new
Put(Bytes.toBytes(hhNeighborRowKey));

nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
Bytes.toBytes(neighborKey));
                                        puts.add(nodeKeyPut);

                                        hubNeighborTypePut = new
Put(Bytes.toBytes(hhNeighborRowKey));

hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
                                        puts.add(hubNeighborTypePut);
                                }
                        }
                }

                try
                {
                        this.neighborTable.put(puts);
                }
                catch (IOException e)
                {
                        e.printStackTrace();
                }
        }
        ......

Thanks so much!

Best regards,
Bing

Re: HBase Is So Slow To Save Data?

Posted by Young Y Kim <yo...@gmail.com>.
In my experience , insert data under 15k/s per region server to avoid gc,
compaction.

On Thu, Aug 30, 2012 at 1:45 AM, Bing Li <lb...@gmail.com> wrote:

> Dear Cristofer,
>
> Thanks so much for your reminding!
>
> Best regards,
> Bing
>
> On Thu, Aug 30, 2012 at 12:32 AM, Cristofer Weber <
> cristofer.weber@neogrid.com> wrote:
>
> > There's also a lot of conversions from same values to byte array
> > representation, eg, your NeighborStructure constants. You should do this
> > conversion only once to save time, since you are doing this inside 3
> nested
> > loops. Not sure about how much this can improve, but you should try this
> > also.
> >
> > Best regards,
> > Cristofer
> >
> > -----Mensagem original-----
> > De: Bing Li [mailto:lblabs@gmail.com]
> > Enviada em: quarta-feira, 29 de agosto de 2012 13:07
> > Para: user@hbase.apache.org
> > Cc: hbase-user@hadoop.apache.org
> > Assunto: Re: HBase Is So Slow To Save Data?
> >
> > I see. Thanks so much!
> >
> > Bing
> >
> >
> > On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <nk...@gmail.com> wrote:
> >
> > > It's not useful here: if you have a memory issue, it's when your using
> > > the list, not when you have finished with it and set it to null.
> > > You need to monitor the memory consumption of the jvm, both the client
> > > & the server.
> > > Google around these keywords, there are many examples on the web.
> > > Google as well arrayList initialization.
> > >
> > > Note as well that the important is not the memory size of the
> > > structure on disk but the size of the" List<Put> puts = new
> > > ArrayList<Put>();" before the table put.
> > >
> > > On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lb...@gmail.com> wrote:
> > >
> > > > Dear N Keywal,
> > > >
> > > > Thanks so much for your reply!
> > > >
> > > > The total amount of data is about 110M. The available memory is
> > > > enough,
> > > 2G.
> > > >
> > > > In Java, I just set a collection to NULL to collect garbage. Do you
> > > > think it is fine?
> > > >
> > > > Best regards,
> > > > Bing
> > > >
> > > >
> > > > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nk...@gmail.com>
> wrote:
> > > >
> > > >> Hi Bing,
> > > >>
> > > >> You should expect HBase to be slower in the generic case:
> > > >> 1) it writes much more data (see hbase data model), with extra
> > > >> columns qualifiers, timestamps & so on.
> > > >> 2) the data is written multiple times: once in the write-ahead-log,
> > > >> once per replica on datanode & so on again.
> > > >> 3) there are inter process calls & inter machine calls on the
> > > >> critical path.
> > > >>
> > > >> This is the cost of the atomicity, reliability and scalability
> > features.
> > > >> With these features in mind, HBase is reasonably fast to save data
> > > >> on a cluster.
> > > >>
> > > >> On your specific case (without the points 2 & 3 above), the
> > > >> performance seems to be very bad.
> > > >>
> > > >> You should first look at:
> > > >> - how much is spent in the put vs. preparing the list
> > > >> - do you have garbage collection going on? even swap?
> > > >> - what's the size of your final Array vs. the available memory?
> > > >>
> > > >> Cheers,
> > > >>
> > > >> N.
> > > >>
> > > >>
> > > >>
> > > >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
> > > >>
> > > >>> Dear all,
> > > >>>
> > > >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> > > >>>
> > > >>> Best regards,
> > > >>> Bing
> > > >>>
> > > >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com>
> wrote:
> > > >>>
> > > >>> > Dear all,
> > > >>> >
> > > >>> > According to my experiences, it is very slow for HBase to save
> > data?
> > > >>> Am I
> > > >>> > right?
> > > >>> >
> > > >>> > For example, today I need to save data in a HashMap to HBase. It
> > > >>> > took about more than three hours. However when saving the same
> > > >>> > HashMap in
> > > a
> > > >>> file
> > > >>> > in the text format with the redirected System.out, it took only
> > > >>> > 4.5
> > > >>> seconds!
> > > >>> >
> > > >>> > Why is HBase so slow? It is indexing?
> > > >>> >
> > > >>> > My code to save data in HBase is as follows. I think the code
> > > >>> > must be correct.
> > > >>> >
> > > >>> >         ......
> > > >>> >         public synchronized void
> > > >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> > > >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
> > > >>> timingScale)
> > > >>> >         {
> > > >>> >                 List<Put> puts = new ArrayList<Put>();
> > > >>> >
> > > >>> >                 String hhNeighborRowKey;
> > > >>> >                 Put hubKeyPut;
> > > >>> >                 Put groupKeyPut;
> > > >>> >                 Put topGroupKeyPut;
> > > >>> >                 Put timingScalePut;
> > > >>> >                 Put nodeKeyPut;
> > > >>> >                 Put hubNeighborTypePut;
> > > >>> >
> > > >>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> > > >>> > Set<String>>> sourceHubGroupNeighborEntry :
> > > >>> hhOutNeighborMap.entrySet())
> > > >>> >                 {
> > > >>> >                         for (Map.Entry<String, Set<String>>
> > > >>> > groupNeighborEntry :
> > > sourceHubGroupNeighborEntry.getValue().entrySet())
> > > >>> >                         {
> > > >>> >                                 for (String neighborKey :
> > > >>> > groupNeighborEntry.getValue())
> > > >>> >                                 {
> > > >>> >                                         hhNeighborRowKey =
> > > >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> > > >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> > > >>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> > > >>> >
> > > >>> >                                         hubKeyPut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY)
> > > ,
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN)
> > > >>> > , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> > > >>> >                                         puts.add(hubKeyPut);
> > > >>> >
> > > >>> >                                         groupKeyPut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMIL
> > > Y),
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUM
> > > >>> > N), Bytes.toBytes(groupNeighborEntry.getKey()));
> > > >>> >                                         puts.add(groupKeyPut);
> > > >>> >
> > > >>> >                                         topGroupKeyPut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> > > MILY),
> > > >>> >
> > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN)
> > > ,
> > > >>> >
> > > >>>
> > > Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry
> > > .getKey())));
> > > >>> >
> > > >>> > puts.add(topGroupKeyPut);
> > > >>> >
> > > >>> >                                         timingScalePut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> > > MILY),
> > > >>> >
> > > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> > > >>> > Bytes.toBytes(timingScale));
> > > >>> >
> > > >>> > puts.add(timingScalePut);
> > > >>> >
> > > >>> >                                         nodeKeyPut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY
> > > ),
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN
> > > >>> > ),
> > > >>> > Bytes.toBytes(neighborKey));
> > > >>> >                                         puts.add(nodeKeyPut);
> > > >>> >
> > > >>> >                                         hubNeighborTypePut = new
> > > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > > >>> >
> > > >>> >
> > > >>>
> > > hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBO
> > > R_FAMILY),
> > > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> > > >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> > > >>> >
> > puts.add(hubNeighborTypePut);
> > > >>> >                                 }
> > > >>> >                         }
> > > >>> >                 }
> > > >>> >
> > > >>> >                 try
> > > >>> >                 {
> > > >>> >                         this.neighborTable.put(puts);
> > > >>> >                 }
> > > >>> >                 catch (IOException e)
> > > >>> >                 {
> > > >>> >                         e.printStackTrace();
> > > >>> >                 }
> > > >>> >         }
> > > >>> >         ......
> > > >>> >
> > > >>> > Thanks so much!
> > > >>> >
> > > >>> > Best regards,
> > > >>> > Bing
> > > >>> >
> > > >>>
> > > >>
> > > >>
> > > >
> > >
> >
>

Re: HBase Is So Slow To Save Data?

Posted by Bing Li <lb...@gmail.com>.
Dear Cristofer,

Thanks so much for your reminding!

Best regards,
Bing

On Thu, Aug 30, 2012 at 12:32 AM, Cristofer Weber <
cristofer.weber@neogrid.com> wrote:

> There's also a lot of conversions from same values to byte array
> representation, eg, your NeighborStructure constants. You should do this
> conversion only once to save time, since you are doing this inside 3 nested
> loops. Not sure about how much this can improve, but you should try this
> also.
>
> Best regards,
> Cristofer
>
> -----Mensagem original-----
> De: Bing Li [mailto:lblabs@gmail.com]
> Enviada em: quarta-feira, 29 de agosto de 2012 13:07
> Para: user@hbase.apache.org
> Cc: hbase-user@hadoop.apache.org
> Assunto: Re: HBase Is So Slow To Save Data?
>
> I see. Thanks so much!
>
> Bing
>
>
> On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <nk...@gmail.com> wrote:
>
> > It's not useful here: if you have a memory issue, it's when your using
> > the list, not when you have finished with it and set it to null.
> > You need to monitor the memory consumption of the jvm, both the client
> > & the server.
> > Google around these keywords, there are many examples on the web.
> > Google as well arrayList initialization.
> >
> > Note as well that the important is not the memory size of the
> > structure on disk but the size of the" List<Put> puts = new
> > ArrayList<Put>();" before the table put.
> >
> > On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lb...@gmail.com> wrote:
> >
> > > Dear N Keywal,
> > >
> > > Thanks so much for your reply!
> > >
> > > The total amount of data is about 110M. The available memory is
> > > enough,
> > 2G.
> > >
> > > In Java, I just set a collection to NULL to collect garbage. Do you
> > > think it is fine?
> > >
> > > Best regards,
> > > Bing
> > >
> > >
> > > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nk...@gmail.com> wrote:
> > >
> > >> Hi Bing,
> > >>
> > >> You should expect HBase to be slower in the generic case:
> > >> 1) it writes much more data (see hbase data model), with extra
> > >> columns qualifiers, timestamps & so on.
> > >> 2) the data is written multiple times: once in the write-ahead-log,
> > >> once per replica on datanode & so on again.
> > >> 3) there are inter process calls & inter machine calls on the
> > >> critical path.
> > >>
> > >> This is the cost of the atomicity, reliability and scalability
> features.
> > >> With these features in mind, HBase is reasonably fast to save data
> > >> on a cluster.
> > >>
> > >> On your specific case (without the points 2 & 3 above), the
> > >> performance seems to be very bad.
> > >>
> > >> You should first look at:
> > >> - how much is spent in the put vs. preparing the list
> > >> - do you have garbage collection going on? even swap?
> > >> - what's the size of your final Array vs. the available memory?
> > >>
> > >> Cheers,
> > >>
> > >> N.
> > >>
> > >>
> > >>
> > >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
> > >>
> > >>> Dear all,
> > >>>
> > >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> > >>>
> > >>> Best regards,
> > >>> Bing
> > >>>
> > >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
> > >>>
> > >>> > Dear all,
> > >>> >
> > >>> > According to my experiences, it is very slow for HBase to save
> data?
> > >>> Am I
> > >>> > right?
> > >>> >
> > >>> > For example, today I need to save data in a HashMap to HBase. It
> > >>> > took about more than three hours. However when saving the same
> > >>> > HashMap in
> > a
> > >>> file
> > >>> > in the text format with the redirected System.out, it took only
> > >>> > 4.5
> > >>> seconds!
> > >>> >
> > >>> > Why is HBase so slow? It is indexing?
> > >>> >
> > >>> > My code to save data in HBase is as follows. I think the code
> > >>> > must be correct.
> > >>> >
> > >>> >         ......
> > >>> >         public synchronized void
> > >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> > >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
> > >>> timingScale)
> > >>> >         {
> > >>> >                 List<Put> puts = new ArrayList<Put>();
> > >>> >
> > >>> >                 String hhNeighborRowKey;
> > >>> >                 Put hubKeyPut;
> > >>> >                 Put groupKeyPut;
> > >>> >                 Put topGroupKeyPut;
> > >>> >                 Put timingScalePut;
> > >>> >                 Put nodeKeyPut;
> > >>> >                 Put hubNeighborTypePut;
> > >>> >
> > >>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> > >>> > Set<String>>> sourceHubGroupNeighborEntry :
> > >>> hhOutNeighborMap.entrySet())
> > >>> >                 {
> > >>> >                         for (Map.Entry<String, Set<String>>
> > >>> > groupNeighborEntry :
> > sourceHubGroupNeighborEntry.getValue().entrySet())
> > >>> >                         {
> > >>> >                                 for (String neighborKey :
> > >>> > groupNeighborEntry.getValue())
> > >>> >                                 {
> > >>> >                                         hhNeighborRowKey =
> > >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> > >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> > >>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> > >>> >
> > >>> >                                         hubKeyPut = new
> > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > >>> >
> > >>> >
> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY)
> > ,
> > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN)
> > >>> > , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> > >>> >                                         puts.add(hubKeyPut);
> > >>> >
> > >>> >                                         groupKeyPut = new
> > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > >>> >
> > >>> >
> > >>>
> > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMIL
> > Y),
> > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUM
> > >>> > N), Bytes.toBytes(groupNeighborEntry.getKey()));
> > >>> >                                         puts.add(groupKeyPut);
> > >>> >
> > >>> >                                         topGroupKeyPut = new
> > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > >>> >
> > >>> >
> > >>>
> > topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> > MILY),
> > >>> >
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN)
> > ,
> > >>> >
> > >>>
> > Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry
> > .getKey())));
> > >>> >
> > >>> > puts.add(topGroupKeyPut);
> > >>> >
> > >>> >                                         timingScalePut = new
> > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > >>> >
> > >>> >
> > >>>
> > timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> > MILY),
> > >>> >
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> > >>> > Bytes.toBytes(timingScale));
> > >>> >
> > >>> > puts.add(timingScalePut);
> > >>> >
> > >>> >                                         nodeKeyPut = new
> > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > >>> >
> > >>> >
> > >>>
> > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY
> > ),
> > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN
> > >>> > ),
> > >>> > Bytes.toBytes(neighborKey));
> > >>> >                                         puts.add(nodeKeyPut);
> > >>> >
> > >>> >                                         hubNeighborTypePut = new
> > >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> > >>> >
> > >>> >
> > >>>
> > hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBO
> > R_FAMILY),
> > >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> > >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> > >>> >
> puts.add(hubNeighborTypePut);
> > >>> >                                 }
> > >>> >                         }
> > >>> >                 }
> > >>> >
> > >>> >                 try
> > >>> >                 {
> > >>> >                         this.neighborTable.put(puts);
> > >>> >                 }
> > >>> >                 catch (IOException e)
> > >>> >                 {
> > >>> >                         e.printStackTrace();
> > >>> >                 }
> > >>> >         }
> > >>> >         ......
> > >>> >
> > >>> > Thanks so much!
> > >>> >
> > >>> > Best regards,
> > >>> > Bing
> > >>> >
> > >>>
> > >>
> > >>
> > >
> >
>

RES: HBase Is So Slow To Save Data?

Posted by Cristofer Weber <cr...@neogrid.com>.
There's also a lot of conversions from same values to byte array representation, eg, your NeighborStructure constants. You should do this conversion only once to save time, since you are doing this inside 3 nested loops. Not sure about how much this can improve, but you should try this also.

Best regards,
Cristofer

-----Mensagem original-----
De: Bing Li [mailto:lblabs@gmail.com] 
Enviada em: quarta-feira, 29 de agosto de 2012 13:07
Para: user@hbase.apache.org
Cc: hbase-user@hadoop.apache.org
Assunto: Re: HBase Is So Slow To Save Data?

I see. Thanks so much!

Bing


On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <nk...@gmail.com> wrote:

> It's not useful here: if you have a memory issue, it's when your using 
> the list, not when you have finished with it and set it to null.
> You need to monitor the memory consumption of the jvm, both the client 
> & the server.
> Google around these keywords, there are many examples on the web.
> Google as well arrayList initialization.
>
> Note as well that the important is not the memory size of the 
> structure on disk but the size of the" List<Put> puts = new 
> ArrayList<Put>();" before the table put.
>
> On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lb...@gmail.com> wrote:
>
> > Dear N Keywal,
> >
> > Thanks so much for your reply!
> >
> > The total amount of data is about 110M. The available memory is 
> > enough,
> 2G.
> >
> > In Java, I just set a collection to NULL to collect garbage. Do you 
> > think it is fine?
> >
> > Best regards,
> > Bing
> >
> >
> > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nk...@gmail.com> wrote:
> >
> >> Hi Bing,
> >>
> >> You should expect HBase to be slower in the generic case:
> >> 1) it writes much more data (see hbase data model), with extra 
> >> columns qualifiers, timestamps & so on.
> >> 2) the data is written multiple times: once in the write-ahead-log, 
> >> once per replica on datanode & so on again.
> >> 3) there are inter process calls & inter machine calls on the 
> >> critical path.
> >>
> >> This is the cost of the atomicity, reliability and scalability features.
> >> With these features in mind, HBase is reasonably fast to save data 
> >> on a cluster.
> >>
> >> On your specific case (without the points 2 & 3 above), the 
> >> performance seems to be very bad.
> >>
> >> You should first look at:
> >> - how much is spent in the put vs. preparing the list
> >> - do you have garbage collection going on? even swap?
> >> - what's the size of your final Array vs. the available memory?
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >>
> >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
> >>
> >>> Dear all,
> >>>
> >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> >>>
> >>> Best regards,
> >>> Bing
> >>>
> >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
> >>>
> >>> > Dear all,
> >>> >
> >>> > According to my experiences, it is very slow for HBase to save data?
> >>> Am I
> >>> > right?
> >>> >
> >>> > For example, today I need to save data in a HashMap to HBase. It 
> >>> > took about more than three hours. However when saving the same 
> >>> > HashMap in
> a
> >>> file
> >>> > in the text format with the redirected System.out, it took only 
> >>> > 4.5
> >>> seconds!
> >>> >
> >>> > Why is HBase so slow? It is indexing?
> >>> >
> >>> > My code to save data in HBase is as follows. I think the code 
> >>> > must be correct.
> >>> >
> >>> >         ......
> >>> >         public synchronized void 
> >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
> >>> timingScale)
> >>> >         {
> >>> >                 List<Put> puts = new ArrayList<Put>();
> >>> >
> >>> >                 String hhNeighborRowKey;
> >>> >                 Put hubKeyPut;
> >>> >                 Put groupKeyPut;
> >>> >                 Put topGroupKeyPut;
> >>> >                 Put timingScalePut;
> >>> >                 Put nodeKeyPut;
> >>> >                 Put hubNeighborTypePut;
> >>> >
> >>> >                 for (Map.Entry<String, ConcurrentHashMap<String, 
> >>> > Set<String>>> sourceHubGroupNeighborEntry :
> >>> hhOutNeighborMap.entrySet())
> >>> >                 {
> >>> >                         for (Map.Entry<String, Set<String>> 
> >>> > groupNeighborEntry :
> sourceHubGroupNeighborEntry.getValue().entrySet())
> >>> >                         {
> >>> >                                 for (String neighborKey :
> >>> > groupNeighborEntry.getValue())
> >>> >                                 {
> >>> >                                         hhNeighborRowKey = 
> >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> >>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> >>> >
> >>> >                                         hubKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY)
> ,
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN)
> >>> > , Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> >>> >                                         puts.add(hubKeyPut);
> >>> >
> >>> >                                         groupKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMIL
> Y),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUM
> >>> > N), Bytes.toBytes(groupNeighborEntry.getKey()));
> >>> >                                         puts.add(groupKeyPut);
> >>> >
> >>> >                                         topGroupKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> MILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN)
> ,
> >>> >
> >>>
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry
> .getKey())));
> >>> >                                         
> >>> > puts.add(topGroupKeyPut);
> >>> >
> >>> >                                         timingScalePut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FA
> MILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> >>> > Bytes.toBytes(timingScale));
> >>> >                                         
> >>> > puts.add(timingScalePut);
> >>> >
> >>> >                                         nodeKeyPut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY
> ),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN
> >>> > ),
> >>> > Bytes.toBytes(neighborKey));
> >>> >                                         puts.add(nodeKeyPut);
> >>> >
> >>> >                                         hubNeighborTypePut = new 
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBO
> R_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> >>> >                                         puts.add(hubNeighborTypePut);
> >>> >                                 }
> >>> >                         }
> >>> >                 }
> >>> >
> >>> >                 try
> >>> >                 {
> >>> >                         this.neighborTable.put(puts);
> >>> >                 }
> >>> >                 catch (IOException e)
> >>> >                 {
> >>> >                         e.printStackTrace();
> >>> >                 }
> >>> >         }
> >>> >         ......
> >>> >
> >>> > Thanks so much!
> >>> >
> >>> > Best regards,
> >>> > Bing
> >>> >
> >>>
> >>
> >>
> >
>

Re: HBase Is So Slow To Save Data?

Posted by Bing Li <lb...@gmail.com>.
I see. Thanks so much!

Bing


On Wed, Aug 29, 2012 at 11:59 PM, N Keywal <nk...@gmail.com> wrote:

> It's not useful here: if you have a memory issue, it's when your using the
> list, not when you have finished with it and set it to null.
> You need to monitor the memory consumption of the jvm, both the client &
> the server.
> Google around these keywords, there are many examples on the web.
> Google as well arrayList initialization.
>
> Note as well that the important is not the memory size of the structure on
> disk but the size of the" List<Put> puts = new ArrayList<Put>();" before
> the table put.
>
> On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lb...@gmail.com> wrote:
>
> > Dear N Keywal,
> >
> > Thanks so much for your reply!
> >
> > The total amount of data is about 110M. The available memory is enough,
> 2G.
> >
> > In Java, I just set a collection to NULL to collect garbage. Do you think
> > it is fine?
> >
> > Best regards,
> > Bing
> >
> >
> > On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nk...@gmail.com> wrote:
> >
> >> Hi Bing,
> >>
> >> You should expect HBase to be slower in the generic case:
> >> 1) it writes much more data (see hbase data model), with extra columns
> >> qualifiers, timestamps & so on.
> >> 2) the data is written multiple times: once in the write-ahead-log, once
> >> per replica on datanode & so on again.
> >> 3) there are inter process calls & inter machine calls on the critical
> >> path.
> >>
> >> This is the cost of the atomicity, reliability and scalability features.
> >> With these features in mind, HBase is reasonably fast to save data on a
> >> cluster.
> >>
> >> On your specific case (without the points 2 & 3 above), the performance
> >> seems to be very bad.
> >>
> >> You should first look at:
> >> - how much is spent in the put vs. preparing the list
> >> - do you have garbage collection going on? even swap?
> >> - what's the size of your final Array vs. the available memory?
> >>
> >> Cheers,
> >>
> >> N.
> >>
> >>
> >>
> >> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
> >>
> >>> Dear all,
> >>>
> >>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
> >>>
> >>> Best regards,
> >>> Bing
> >>>
> >>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
> >>>
> >>> > Dear all,
> >>> >
> >>> > According to my experiences, it is very slow for HBase to save data?
> >>> Am I
> >>> > right?
> >>> >
> >>> > For example, today I need to save data in a HashMap to HBase. It took
> >>> > about more than three hours. However when saving the same HashMap in
> a
> >>> file
> >>> > in the text format with the redirected System.out, it took only 4.5
> >>> seconds!
> >>> >
> >>> > Why is HBase so slow? It is indexing?
> >>> >
> >>> > My code to save data in HBase is as follows. I think the code must be
> >>> > correct.
> >>> >
> >>> >         ......
> >>> >         public synchronized void
> >>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> >>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
> >>> timingScale)
> >>> >         {
> >>> >                 List<Put> puts = new ArrayList<Put>();
> >>> >
> >>> >                 String hhNeighborRowKey;
> >>> >                 Put hubKeyPut;
> >>> >                 Put groupKeyPut;
> >>> >                 Put topGroupKeyPut;
> >>> >                 Put timingScalePut;
> >>> >                 Put nodeKeyPut;
> >>> >                 Put hubNeighborTypePut;
> >>> >
> >>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> >>> > Set<String>>> sourceHubGroupNeighborEntry :
> >>> hhOutNeighborMap.entrySet())
> >>> >                 {
> >>> >                         for (Map.Entry<String, Set<String>>
> >>> > groupNeighborEntry :
> sourceHubGroupNeighborEntry.getValue().entrySet())
> >>> >                         {
> >>> >                                 for (String neighborKey :
> >>> > groupNeighborEntry.getValue())
> >>> >                                 {
> >>> >                                         hhNeighborRowKey =
> >>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> >>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> >>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> >>> >
> >>> >                                         hubKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
> >>> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> >>> >                                         puts.add(hubKeyPut);
> >>> >
> >>> >                                         groupKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
> >>> > Bytes.toBytes(groupNeighborEntry.getKey()));
> >>> >                                         puts.add(groupKeyPut);
> >>> >
> >>> >                                         topGroupKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
> >>> >
> >>>
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
> >>> >                                         puts.add(topGroupKeyPut);
> >>> >
> >>> >                                         timingScalePut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> >
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> >>> > Bytes.toBytes(timingScale));
> >>> >                                         puts.add(timingScalePut);
> >>> >
> >>> >                                         nodeKeyPut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
> >>> > Bytes.toBytes(neighborKey));
> >>> >                                         puts.add(nodeKeyPut);
> >>> >
> >>> >                                         hubNeighborTypePut = new
> >>> > Put(Bytes.toBytes(hhNeighborRowKey));
> >>> >
> >>> >
> >>>
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> >>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> >>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> >>> >                                         puts.add(hubNeighborTypePut);
> >>> >                                 }
> >>> >                         }
> >>> >                 }
> >>> >
> >>> >                 try
> >>> >                 {
> >>> >                         this.neighborTable.put(puts);
> >>> >                 }
> >>> >                 catch (IOException e)
> >>> >                 {
> >>> >                         e.printStackTrace();
> >>> >                 }
> >>> >         }
> >>> >         ......
> >>> >
> >>> > Thanks so much!
> >>> >
> >>> > Best regards,
> >>> > Bing
> >>> >
> >>>
> >>
> >>
> >
>

Re: HBase Is So Slow To Save Data?

Posted by N Keywal <nk...@gmail.com>.
It's not useful here: if you have a memory issue, it's when your using the
list, not when you have finished with it and set it to null.
You need to monitor the memory consumption of the jvm, both the client &
the server.
Google around these keywords, there are many examples on the web.
Google as well arrayList initialization.

Note as well that the important is not the memory size of the structure on
disk but the size of the" List<Put> puts = new ArrayList<Put>();" before
the table put.

On Wed, Aug 29, 2012 at 5:42 PM, Bing Li <lb...@gmail.com> wrote:

> Dear N Keywal,
>
> Thanks so much for your reply!
>
> The total amount of data is about 110M. The available memory is enough, 2G.
>
> In Java, I just set a collection to NULL to collect garbage. Do you think
> it is fine?
>
> Best regards,
> Bing
>
>
> On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nk...@gmail.com> wrote:
>
>> Hi Bing,
>>
>> You should expect HBase to be slower in the generic case:
>> 1) it writes much more data (see hbase data model), with extra columns
>> qualifiers, timestamps & so on.
>> 2) the data is written multiple times: once in the write-ahead-log, once
>> per replica on datanode & so on again.
>> 3) there are inter process calls & inter machine calls on the critical
>> path.
>>
>> This is the cost of the atomicity, reliability and scalability features.
>> With these features in mind, HBase is reasonably fast to save data on a
>> cluster.
>>
>> On your specific case (without the points 2 & 3 above), the performance
>> seems to be very bad.
>>
>> You should first look at:
>> - how much is spent in the put vs. preparing the list
>> - do you have garbage collection going on? even swap?
>> - what's the size of your final Array vs. the available memory?
>>
>> Cheers,
>>
>> N.
>>
>>
>>
>> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>>>
>>> Best regards,
>>> Bing
>>>
>>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
>>>
>>> > Dear all,
>>> >
>>> > According to my experiences, it is very slow for HBase to save data?
>>> Am I
>>> > right?
>>> >
>>> > For example, today I need to save data in a HashMap to HBase. It took
>>> > about more than three hours. However when saving the same HashMap in a
>>> file
>>> > in the text format with the redirected System.out, it took only 4.5
>>> seconds!
>>> >
>>> > Why is HBase so slow? It is indexing?
>>> >
>>> > My code to save data in HBase is as follows. I think the code must be
>>> > correct.
>>> >
>>> >         ......
>>> >         public synchronized void
>>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
>>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
>>> timingScale)
>>> >         {
>>> >                 List<Put> puts = new ArrayList<Put>();
>>> >
>>> >                 String hhNeighborRowKey;
>>> >                 Put hubKeyPut;
>>> >                 Put groupKeyPut;
>>> >                 Put topGroupKeyPut;
>>> >                 Put timingScalePut;
>>> >                 Put nodeKeyPut;
>>> >                 Put hubNeighborTypePut;
>>> >
>>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
>>> > Set<String>>> sourceHubGroupNeighborEntry :
>>> hhOutNeighborMap.entrySet())
>>> >                 {
>>> >                         for (Map.Entry<String, Set<String>>
>>> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
>>> >                         {
>>> >                                 for (String neighborKey :
>>> > groupNeighborEntry.getValue())
>>> >                                 {
>>> >                                         hhNeighborRowKey =
>>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
>>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
>>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
>>> >
>>> >                                         hubKeyPut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
>>> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
>>> >                                         puts.add(hubKeyPut);
>>> >
>>> >                                         groupKeyPut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> >
>>> groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
>>> > Bytes.toBytes(groupNeighborEntry.getKey()));
>>> >                                         puts.add(groupKeyPut);
>>> >
>>> >                                         topGroupKeyPut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> >
>>> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
>>> >
>>> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
>>> >                                         puts.add(topGroupKeyPut);
>>> >
>>> >                                         timingScalePut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> >
>>> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
>>> > Bytes.toBytes(timingScale));
>>> >                                         puts.add(timingScalePut);
>>> >
>>> >                                         nodeKeyPut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> >
>>> nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
>>> > Bytes.toBytes(neighborKey));
>>> >                                         puts.add(nodeKeyPut);
>>> >
>>> >                                         hubNeighborTypePut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> >
>>> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
>>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
>>> >                                         puts.add(hubNeighborTypePut);
>>> >                                 }
>>> >                         }
>>> >                 }
>>> >
>>> >                 try
>>> >                 {
>>> >                         this.neighborTable.put(puts);
>>> >                 }
>>> >                 catch (IOException e)
>>> >                 {
>>> >                         e.printStackTrace();
>>> >                 }
>>> >         }
>>> >         ......
>>> >
>>> > Thanks so much!
>>> >
>>> > Best regards,
>>> > Bing
>>> >
>>>
>>
>>
>

Re: HBase Is So Slow To Save Data?

Posted by Bing Li <lb...@gmail.com>.
Dear N Keywal,

Thanks so much for your reply!

The total amount of data is about 110M. The available memory is enough, 2G.

In Java, I just set a collection to NULL to collect garbage. Do you think
it is fine?

Best regards,
Bing

On Wed, Aug 29, 2012 at 11:22 PM, N Keywal <nk...@gmail.com> wrote:

> Hi Bing,
>
> You should expect HBase to be slower in the generic case:
> 1) it writes much more data (see hbase data model), with extra columns
> qualifiers, timestamps & so on.
> 2) the data is written multiple times: once in the write-ahead-log, once
> per replica on datanode & so on again.
> 3) there are inter process calls & inter machine calls on the critical
> path.
>
> This is the cost of the atomicity, reliability and scalability features.
> With these features in mind, HBase is reasonably fast to save data on a
> cluster.
>
> On your specific case (without the points 2 & 3 above), the performance
> seems to be very bad.
>
> You should first look at:
> - how much is spent in the put vs. preparing the list
> - do you have garbage collection going on? even swap?
> - what's the size of your final Array vs. the available memory?
>
> Cheers,
>
> N.
>
>
>
> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
>
>> Dear all,
>>
>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>>
>> Best regards,
>> Bing
>>
>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
>>
>> > Dear all,
>> >
>> > According to my experiences, it is very slow for HBase to save data? Am
>> I
>> > right?
>> >
>> > For example, today I need to save data in a HashMap to HBase. It took
>> > about more than three hours. However when saving the same HashMap in a
>> file
>> > in the text format with the redirected System.out, it took only 4.5
>> seconds!
>> >
>> > Why is HBase so slow? It is indexing?
>> >
>> > My code to save data in HBase is as follows. I think the code must be
>> > correct.
>> >
>> >         ......
>> >         public synchronized void
>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
>> timingScale)
>> >         {
>> >                 List<Put> puts = new ArrayList<Put>();
>> >
>> >                 String hhNeighborRowKey;
>> >                 Put hubKeyPut;
>> >                 Put groupKeyPut;
>> >                 Put topGroupKeyPut;
>> >                 Put timingScalePut;
>> >                 Put nodeKeyPut;
>> >                 Put hubNeighborTypePut;
>> >
>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
>> > Set<String>>> sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
>> >                 {
>> >                         for (Map.Entry<String, Set<String>>
>> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
>> >                         {
>> >                                 for (String neighborKey :
>> > groupNeighborEntry.getValue())
>> >                                 {
>> >                                         hhNeighborRowKey =
>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
>> >
>> >                                         hubKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
>> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
>> >                                         puts.add(hubKeyPut);
>> >
>> >                                         groupKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> >
>> groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
>> > Bytes.toBytes(groupNeighborEntry.getKey()));
>> >                                         puts.add(groupKeyPut);
>> >
>> >                                         topGroupKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> >
>> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
>> >
>> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
>> >                                         puts.add(topGroupKeyPut);
>> >
>> >                                         timingScalePut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> >
>> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
>> > Bytes.toBytes(timingScale));
>> >                                         puts.add(timingScalePut);
>> >
>> >                                         nodeKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
>> > Bytes.toBytes(neighborKey));
>> >                                         puts.add(nodeKeyPut);
>> >
>> >                                         hubNeighborTypePut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> >
>> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
>> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
>> >                                         puts.add(hubNeighborTypePut);
>> >                                 }
>> >                         }
>> >                 }
>> >
>> >                 try
>> >                 {
>> >                         this.neighborTable.put(puts);
>> >                 }
>> >                 catch (IOException e)
>> >                 {
>> >                         e.printStackTrace();
>> >                 }
>> >         }
>> >         ......
>> >
>> > Thanks so much!
>> >
>> > Best regards,
>> > Bing
>> >
>>
>
>

Re: HBase Is So Slow To Save Data?

Posted by Mohammad Tariq <do...@gmail.com>.
I don't think 2G is sufficient enough keeping in mind that all the hadoop
daemons are running on the same box. (Maybe your IDE and other stuff too).

On Wednesday, August 29, 2012, Mohammad Tariq <do...@gmail.com> wrote:
> Pseudo-distributed setup could be a cause.
>
> On Wednesday, August 29, 2012, N Keywal <nk...@gmail.com> wrote:
>> Hi Bing,
>>
>> You should expect HBase to be slower in the generic case:
>> 1) it writes much more data (see hbase data model), with extra columns
>> qualifiers, timestamps & so on.
>> 2) the data is written multiple times: once in the write-ahead-log, once
>> per replica on datanode & so on again.
>> 3) there are inter process calls & inter machine calls on the critical
path.
>>
>> This is the cost of the atomicity, reliability and scalability features.
>> With these features in mind, HBase is reasonably fast to save data on a
>> cluster.
>>
>> On your specific case (without the points 2 & 3 above), the performance
>> seems to be very bad.
>>
>> You should first look at:
>> - how much is spent in the put vs. preparing the list
>> - do you have garbage collection going on? even swap?
>> - what's the size of your final Array vs. the available memory?
>>
>> Cheers,
>>
>> N.
>>
>>
>> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
>>
>>> Dear all,
>>>
>>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>>>
>>> Best regards,
>>> Bing
>>>
>>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
>>>
>>> > Dear all,
>>> >
>>> > According to my experiences, it is very slow for HBase to save data?
Am I
>>> > right?
>>> >
>>> > For example, today I need to save data in a HashMap to HBase. It took
>>> > about more than three hours. However when saving the same HashMap in a
>>> file
>>> > in the text format with the redirected System.out, it took only 4.5
>>> seconds!
>>> >
>>> > Why is HBase so slow? It is indexing?
>>> >
>>> > My code to save data in HBase is as follows. I think the code must be
>>> > correct.
>>> >
>>> >         ......
>>> >         public synchronized void
>>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
>>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
>>> timingScale)
>>> >         {
>>> >                 List<Put> puts = new ArrayList<Put>();
>>> >
>>> >                 String hhNeighborRowKey;
>>> >                 Put hubKeyPut;
>>> >                 Put groupKeyPut;
>>> >                 Put topGroupKeyPut;
>>> >                 Put timingScalePut;
>>> >                 Put nodeKeyPut;
>>> >                 Put hubNeighborTypePut;
>>> >
>>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
>>> > Set<String>>> sourceHubGroupNeighborEntry :
hhOutNeighborMap.entrySet())
>>> >                 {
>>> >                         for (Map.Entry<String, Set<String>>
>>> > groupNeighborEntry :
sourceHubGroupNeighborEntry.getValue().entrySet())
>>> >                         {
>>> >                                 for (String neighborKey :
>>> > groupNeighborEntry.getValue())
>>> >                                 {
>>> >                                         hhNeighborRowKey =
>>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
>>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
>>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
>>> >
>>> >                                         hubKeyPut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> >
hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
>>> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
>>> >                                         puts.add(hubKeyPut);
>>> >
>>> >                                         groupKeyPut = new
>>> > Put(Bytes.toBytes(hhNeighborRowKey));
>>> >
>>> >
groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
>>> > Bytes.toBytes(groupNeig--
> Regards,
>     Mohammad Tariq
>

-- 
Regards,
    Mohammad Tariq

Re: HBase Is So Slow To Save Data?

Posted by Mohammad Tariq <do...@gmail.com>.
Pseudo-distributed setup could be a cause.

On Wednesday, August 29, 2012, N Keywal <nk...@gmail.com> wrote:
> Hi Bing,
>
> You should expect HBase to be slower in the generic case:
> 1) it writes much more data (see hbase data model), with extra columns
> qualifiers, timestamps & so on.
> 2) the data is written multiple times: once in the write-ahead-log, once
> per replica on datanode & so on again.
> 3) there are inter process calls & inter machine calls on the critical
path.
>
> This is the cost of the atomicity, reliability and scalability features.
> With these features in mind, HBase is reasonably fast to save data on a
> cluster.
>
> On your specific case (without the points 2 & 3 above), the performance
> seems to be very bad.
>
> You should first look at:
> - how much is spent in the put vs. preparing the list
> - do you have garbage collection going on? even swap?
> - what's the size of your final Array vs. the available memory?
>
> Cheers,
>
> N.
>
>
> On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:
>
>> Dear all,
>>
>> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>>
>> Best regards,
>> Bing
>>
>> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
>>
>> > Dear all,
>> >
>> > According to my experiences, it is very slow for HBase to save data?
Am I
>> > right?
>> >
>> > For example, today I need to save data in a HashMap to HBase. It took
>> > about more than three hours. However when saving the same HashMap in a
>> file
>> > in the text format with the redirected System.out, it took only 4.5
>> seconds!
>> >
>> > Why is HBase so slow? It is indexing?
>> >
>> > My code to save data in HBase is as follows. I think the code must be
>> > correct.
>> >
>> >         ......
>> >         public synchronized void
>> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
>> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
>> timingScale)
>> >         {
>> >                 List<Put> puts = new ArrayList<Put>();
>> >
>> >                 String hhNeighborRowKey;
>> >                 Put hubKeyPut;
>> >                 Put groupKeyPut;
>> >                 Put topGroupKeyPut;
>> >                 Put timingScalePut;
>> >                 Put nodeKeyPut;
>> >                 Put hubNeighborTypePut;
>> >
>> >                 for (Map.Entry<String, ConcurrentHashMap<String,
>> > Set<String>>> sourceHubGroupNeighborEntry :
hhOutNeighborMap.entrySet())
>> >                 {
>> >                         for (Map.Entry<String, Set<String>>
>> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
>> >                         {
>> >                                 for (String neighborKey :
>> > groupNeighborEntry.getValue())
>> >                                 {
>> >                                         hhNeighborRowKey =
>> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
>> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
>> > groupNeighborEntry.getKey() + timingScale + neighborKey);
>> >
>> >                                         hubKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
>> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
>> >                                         puts.add(hubKeyPut);
>> >
>> >                                         groupKeyPut = new
>> > Put(Bytes.toBytes(hhNeighborRowKey));
>> >
>> >
groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
>> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
>> > Bytes.toBytes(groupNeighborEntry.getKey()));
>> >

-- 
Regards,
    Mohammad Tariq

Re: HBase Is So Slow To Save Data?

Posted by N Keywal <nk...@gmail.com>.
Hi Bing,

You should expect HBase to be slower in the generic case:
1) it writes much more data (see hbase data model), with extra columns
qualifiers, timestamps & so on.
2) the data is written multiple times: once in the write-ahead-log, once
per replica on datanode & so on again.
3) there are inter process calls & inter machine calls on the critical path.

This is the cost of the atomicity, reliability and scalability features.
With these features in mind, HBase is reasonably fast to save data on a
cluster.

On your specific case (without the points 2 & 3 above), the performance
seems to be very bad.

You should first look at:
- how much is spent in the put vs. preparing the list
- do you have garbage collection going on? even swap?
- what's the size of your final Array vs. the available memory?

Cheers,

N.


On Wed, Aug 29, 2012 at 4:08 PM, Bing Li <lb...@gmail.com> wrote:

> Dear all,
>
> By the way, my HBase is in the pseudo-distributed mode. Thanks!
>
> Best regards,
> Bing
>
> On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:
>
> > Dear all,
> >
> > According to my experiences, it is very slow for HBase to save data? Am I
> > right?
> >
> > For example, today I need to save data in a HashMap to HBase. It took
> > about more than three hours. However when saving the same HashMap in a
> file
> > in the text format with the redirected System.out, it took only 4.5
> seconds!
> >
> > Why is HBase so slow? It is indexing?
> >
> > My code to save data in HBase is as follows. I think the code must be
> > correct.
> >
> >         ......
> >         public synchronized void
> > AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> > ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int
> timingScale)
> >         {
> >                 List<Put> puts = new ArrayList<Put>();
> >
> >                 String hhNeighborRowKey;
> >                 Put hubKeyPut;
> >                 Put groupKeyPut;
> >                 Put topGroupKeyPut;
> >                 Put timingScalePut;
> >                 Put nodeKeyPut;
> >                 Put hubNeighborTypePut;
> >
> >                 for (Map.Entry<String, ConcurrentHashMap<String,
> > Set<String>>> sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
> >                 {
> >                         for (Map.Entry<String, Set<String>>
> > groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
> >                         {
> >                                 for (String neighborKey :
> > groupNeighborEntry.getValue())
> >                                 {
> >                                         hhNeighborRowKey =
> > NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> > Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> > groupNeighborEntry.getKey() + timingScale + neighborKey);
> >
> >                                         hubKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
> > Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
> >                                         puts.add(hubKeyPut);
> >
> >                                         groupKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
> > Bytes.toBytes(groupNeighborEntry.getKey()));
> >                                         puts.add(groupKeyPut);
> >
> >                                         topGroupKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
> >
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
> >                                         puts.add(topGroupKeyPut);
> >
> >                                         timingScalePut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> > Bytes.toBytes(timingScale));
> >                                         puts.add(timingScalePut);
> >
> >                                         nodeKeyPut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> > nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
> > Bytes.toBytes(neighborKey));
> >                                         puts.add(nodeKeyPut);
> >
> >                                         hubNeighborTypePut = new
> > Put(Bytes.toBytes(hhNeighborRowKey));
> >
> >
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> > Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> > Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
> >                                         puts.add(hubNeighborTypePut);
> >                                 }
> >                         }
> >                 }
> >
> >                 try
> >                 {
> >                         this.neighborTable.put(puts);
> >                 }
> >                 catch (IOException e)
> >                 {
> >                         e.printStackTrace();
> >                 }
> >         }
> >         ......
> >
> > Thanks so much!
> >
> > Best regards,
> > Bing
> >
>

Re: HBase Is So Slow To Save Data?

Posted by Bing Li <lb...@gmail.com>.
Dear all,

By the way, my HBase is in the pseudo-distributed mode. Thanks!

Best regards,
Bing

On Wed, Aug 29, 2012 at 10:04 PM, Bing Li <lb...@gmail.com> wrote:

> Dear all,
>
> According to my experiences, it is very slow for HBase to save data? Am I
> right?
>
> For example, today I need to save data in a HashMap to HBase. It took
> about more than three hours. However when saving the same HashMap in a file
> in the text format with the redirected System.out, it took only 4.5 seconds!
>
> Why is HBase so slow? It is indexing?
>
> My code to save data in HBase is as follows. I think the code must be
> correct.
>
>         ......
>         public synchronized void
> AddVirtualOutgoingHHNeighbors(ConcurrentHashMap<String,
> ConcurrentHashMap<String, Set<String>>> hhOutNeighborMap, int timingScale)
>         {
>                 List<Put> puts = new ArrayList<Put>();
>
>                 String hhNeighborRowKey;
>                 Put hubKeyPut;
>                 Put groupKeyPut;
>                 Put topGroupKeyPut;
>                 Put timingScalePut;
>                 Put nodeKeyPut;
>                 Put hubNeighborTypePut;
>
>                 for (Map.Entry<String, ConcurrentHashMap<String,
> Set<String>>> sourceHubGroupNeighborEntry : hhOutNeighborMap.entrySet())
>                 {
>                         for (Map.Entry<String, Set<String>>
> groupNeighborEntry : sourceHubGroupNeighborEntry.getValue().entrySet())
>                         {
>                                 for (String neighborKey :
> groupNeighborEntry.getValue())
>                                 {
>                                         hhNeighborRowKey =
> NeighborStructure.HUB_HUB_NEIGHBOR_ROW +
> Tools.GetAHash(sourceHubGroupNeighborEntry.getKey() +
> groupNeighborEntry.getKey() + timingScale + neighborKey);
>
>                                         hubKeyPut = new
> Put(Bytes.toBytes(hhNeighborRowKey));
>
> hubKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_HUB_KEY_COLUMN),
> Bytes.toBytes(sourceHubGroupNeighborEntry.getKey()));
>                                         puts.add(hubKeyPut);
>
>                                         groupKeyPut = new
> Put(Bytes.toBytes(hhNeighborRowKey));
>
> groupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_GROUP_KEY_COLUMN),
> Bytes.toBytes(groupNeighborEntry.getKey()));
>                                         puts.add(groupKeyPut);
>
>                                         topGroupKeyPut = new
> Put(Bytes.toBytes(hhNeighborRowKey));
>
> topGroupKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TOP_GROUP_KEY_COLUMN),
> Bytes.toBytes(GroupRegistry.WWW().GetParentGroupKey(groupNeighborEntry.getKey())));
>                                         puts.add(topGroupKeyPut);
>
>                                         timingScalePut = new
> Put(Bytes.toBytes(hhNeighborRowKey));
>
> timingScalePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TIMING_SCALE_COLUMN),
> Bytes.toBytes(timingScale));
>                                         puts.add(timingScalePut);
>
>                                         nodeKeyPut = new
> Put(Bytes.toBytes(hhNeighborRowKey));
>
> nodeKeyPut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_NODE_KEY_COLUMN),
> Bytes.toBytes(neighborKey));
>                                         puts.add(nodeKeyPut);
>
>                                         hubNeighborTypePut = new
> Put(Bytes.toBytes(hhNeighborRowKey));
>
> hubNeighborTypePut.add(Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_FAMILY),
> Bytes.toBytes(NeighborStructure.HUB_HUB_NEIGHBOR_TYPE_COLUMN),
> Bytes.toBytes(SocialRole.VIRTUAL_NEIGHBOR));
>                                         puts.add(hubNeighborTypePut);
>                                 }
>                         }
>                 }
>
>                 try
>                 {
>                         this.neighborTable.put(puts);
>                 }
>                 catch (IOException e)
>                 {
>                         e.printStackTrace();
>                 }
>         }
>         ......
>
> Thanks so much!
>
> Best regards,
> Bing
>