You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hbase.apache.org by Shuja Rehman <sh...@gmail.com> on 2011/02/04 13:43:46 UTC

asynchronous hbase for batch inserts?

Hi

I was wondering if anyone can share the working example for batch inserting
using following asynchronous hbase.

http://tsunanet.net/~tsuna/asynchbase/api/<http://tsunanet.net/%7Etsuna/asynchbase/api/>

More specific can provide example equivalent to this

for(i=0; i<N; i++) {
 list.add(putitem[i]);
}
htable.put(list);

Thanks

-- 
Regards
Shuja-ur-Rehman Baig
<http://pk.linkedin.com/in/shujamughal>

Re: asynchronous hbase for batch inserts?

Posted by Stack <st...@duboce.net>.
Checkout opentsdb.  There are a bunch of examples in the code base of
it using asynchbase.  Grep HBaseClient or 'client' in the opentsdb
database.
St.Ack

On Fri, Feb 4, 2011 at 4:43 AM, Shuja Rehman <sh...@gmail.com> wrote:
> Hi
>
> I was wondering if anyone can share the working example for batch inserting
> using following asynchronous hbase.
>
> http://tsunanet.net/~tsuna/asynchbase/api/<http://tsunanet.net/%7Etsuna/asynchbase/api/>
>
> More specific can provide example equivalent to this
>
> for(i=0; i<N; i++) {
>  list.add(putitem[i]);
> }
> htable.put(list);
>
> Thanks
>
> --
> Regards
> Shuja-ur-Rehman Baig
> <http://pk.linkedin.com/in/shujamughal>
>

Re: Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: 12.34.56.78:60000

Posted by Jean-Daniel Cryans <jd...@apache.org>.
You should take a look at the master log and see if it looks normal or
not. Maybe also check if the process is running.

J-D

On Sat, Feb 5, 2011 at 11:43 AM, Jérôme Verstrynge <jv...@gmail.com> wrote:
> Hi,
>
> I have installed Cloudera's CDH3 successfully on a node. I have written a
> small application attempting to connect to it. My hbase-site.xml is very
> simple:
>
> <?xml version="1.0"?>
> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
> <configuration>
> <property>
> <name>hbase.zookeeper.quorum</name>
> <value>12.34.56.78</value>
> </property>
> </configuration>
>
> The Zookeeper connection is successful, but I get the following error
> message systematically:
>
> Exception in thread "main"
> org.apache.hadoop.hbase.MasterNotRunningException: 12.34.56.78:60000
>        at
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:291)
>        at
> org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:72)
>        at HBaseTest.MyLittleHBaseClient.main(MyLittleHBaseClient.java:17)
>
> Does anyone know what is happening? What would be the solution?
>
> Thanks!
>

Exception in thread "main" org.apache.hadoop.hbase.MasterNotRunningException: 12.34.56.78:60000

Posted by Jérôme Verstrynge <jv...@gmail.com>.
Hi,

I have installed Cloudera's CDH3 successfully on a node. I have written 
a small application attempting to connect to it. My hbase-site.xml is 
very simple:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hbase.zookeeper.quorum</name>
<value>12.34.56.78</value>
</property>
</configuration>

The Zookeeper connection is successful, but I get the following error 
message systematically:

Exception in thread "main" 
org.apache.hadoop.hbase.MasterNotRunningException: 12.34.56.78:60000
         at 
org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getMaster(HConnectionManager.java:291)
         at 
org.apache.hadoop.hbase.client.HBaseAdmin.<init>(HBaseAdmin.java:72)
         at HBaseTest.MyLittleHBaseClient.main(MyLittleHBaseClient.java:17)

Does anyone know what is happening? What would be the solution?

Thanks!

Re: asynchronous hbase for batch inserts?

Posted by tsuna <ts...@gmail.com>.
On Fri, Feb 4, 2011 at 4:43 AM, Shuja Rehman <sh...@gmail.com> wrote:
> More specific can provide example equivalent to this
>
> for(i=0; i<N; i++) {
>  list.add(putitem[i]);
> }
> htable.put(list);

The equivalent would be:

Callback<Object, Object> callback = new Callback<Object, Object> {
  public Object run(Object arg) {
    // Do whatever you want on a successful write.
    return arg;
  }
  public String toString() {
    return "handle successful write";
  }
};

Callback<Object, Object> errback = new Callback<Object, Object> {
  public Object run(Object arg) {
    // Do whatever you want on a failed write.
    return arg;
  }
  public String toString() {
    return "handle failed write";
  }
};

PutRequest[] putitem = ...;
HBaseClient client = ...;
for (int i = 0; i < N; i++) {
  client.put(putitem[i]).addCallbacks(callback, errback);
}

For each PutRequest, either `callback' or `errback' will be called
asynchronously from a different thread (you can't control which)
whenever the request has completed.  If you only want to handle
failures (which is common), you can do:
for (int i = 0; i < N; i++) {
  client.put(putitem[i]).addErrback(errback);
}

For more on the Deferred API, please read
http://www.tsunanet.net/~tsuna/async/api/com/stumbleupon/async/Deferred.html
Deferred is a very powerful API for any kind of asynchronous processing.


So overall the code remains the same except that:
 * You use callbacks to get the result of your operation asynchronously.
 * You don't give a whole list to the client at once, you give it
requests one by one, it does the batching internally anyway.
 * You must be prepared to handle the response from another thread
(you don't know which), so your callbacks need to be thread-safe and
call thread-safe APIs.

Internally, asynchbase will route each PutRequest to the right region
server.  It uses both a size-based and time-based flush threshold
(e.g. either after N milliseconds or M edits, whichever reaches its
threshold first).  Depending on your workload, asynchbase can achieve
higher batching efficiency than HTable and lower latency.  In OpenTSDB
I've seen dramatic improvements of up to an order of magnitude.  In
case of failures due to region splits, asynchbase also behaves better
because instead of retrying all the failed edits in one batch, it'll
retry each edit individually, so edits that aren't going to the region
being split won't need to wait for the split to terminate unlike with
HTable (with multiPut at least).

-- 
Benoit "tsuna" Sigoure
Software Engineer @ www.StumbleUpon.com