You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Rajgopal Vaithiyanathan <ra...@gmail.com> on 2012/04/26 13:38:46 UTC

Best Hbase Storage for PIG

Hey all,

The default - HBaseStorage() takes hell lot of time for puts.

In a cluster of 5 machines, insertion of 175 Million records took 4Hours 45
minutes
Question -  Is this good enough ?
each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's heap
has been configured to 8GB.
If the put speed is low, how can i improve them..?

I tried tweaking the TableOutputFormat by increasing the WriteBufferSize to
24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
and putting it as a batch).  After doing this,  it started throwing

java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]

Which i assume is because, the clients took too long to put.

The detailed log is as follows from one of the reduce job is as follows.

I've 'censored' some of the details. which i assume is Okay.! :P
2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
Loaded the native-hadoop library
2012-04-23 20:07:13,097 WARN
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
exists!
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:host.name=*****.*****
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.version=1.6.0_22
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.vendor=Sun Microsystems Inc.
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.class.path=****************************
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.library.path=**********************
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.io.tmpdir=***************************
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:java.compiler=<NA>
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.name=Linux
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.arch=amd64
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:os.version=2.6.38-8-server
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.name=raj

2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.home=*********
2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
environment:user.dir=**********************:
2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
client connection, connectString=master:2181 sessionTimeout=180000
watcher=hconnection
2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
socket connection to server /172.21.208.180:2181
2012-04-23 20:07:13,823 INFO
org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
this process is 72909@slave1.slave1
2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
connection established to master/172.21.208.180:2181, initiating session
2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
establishment complete on server master/172.21.208.180:2181, sessionid =
0x136dfa124e90015, negotiated timeout = 180000
2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
table instance for index
2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
exited with exit code 0
2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
2012-04-23 20:08:49,852 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Failed all from
region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
hostname=slave1, port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]
    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
    at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
    at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
    at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
    at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.SocketTimeoutException: Call to slave1/
172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]
    at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
    at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
    at $Proxy7.multi(Unknown Source)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:679)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41135remote=slave1/
172.21.208.176:60020]
    at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
    at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
2012-04-23 20:09:51,018 WARN
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
Failed all from
region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
hostname=slave1, port=60020
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41150remote=slave1/
172.21.208.176:60020]
    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
    at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
    at
com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
    at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
    at
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
    at
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:416)
    at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
    at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.net.SocketTimeoutException: Call to slave1/
172.21.208.176:60020 failed on socket timeout exception:
java.net.SocketTimeoutException: 60000 millis timeout while waiting for
channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41150remote=slave1/
172.21.208.176:60020]
    at
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
    at
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
    at $Proxy7.multi(Unknown Source)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
    at
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
    at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
    at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
    at java.lang.Thread.run(Thread.java:679)
Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected
local=/172.21.208.176:41150remote=slave1/
172.21.208.176:60020]
    at
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
    at
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
    at java.io.FilterInputStream.read(FilterInputStream.java:133)
    at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
    at java.io.DataInputStream.readInt(DataInputStream.java:387)
    at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
    at
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)

-- 
Thanks and Regards,
Raj

Re: Best Hbase Storage for PIG

Posted by Rajgopal Vaithiyanathan <ra...@gmail.com>.
@raghu, good idea. will do the scan benchmark soon..


On Fri, Apr 27, 2012 at 10:08 PM, Raghu Angadi <ra...@apache.org> wrote:

> A lot of factors can affect HBase performance.. could even be some hardware
> related (slow network, or disk)..
>
> How fast can you scan? does that work well?
> You could take jstack of the clients (reducer) and region servers when you
> are writing and post them and/or hbase list. This would point to where the
> bottleneck is.
>
> Raghu.
>
> On Thu, Apr 26, 2012 at 11:13 PM, Rajgopal Vaithiyanathan <
> raja.fire@gmail.com> wrote:
>
> > @doug
> > Regarding monotonically increasing keys, I took care by randomizing the
> > data order.
> > Regarding pre-created regions - did not know i can do that. Thanks.
> > But when i looked into the case studys, the section "HBase Region With
> > Non-Local Data". Will this be a problem when I pre-create the regions?
> >
> >
> > @michel
> > Schema is simple..
> > one column family... in which we'll insert a max of 10 columns. 4 columns
> > are compulsory. and other 6 cols are sparsely filled.
> >
> > KEY: a string of 50 Characters
> > Col1: int
> > Col2: string of 20 characters
> > col3: string of 20 characters
> > col4 : int
> > col5 : int [ sparse ]
> > col6: float [sparse]
> > col7: string of 3 char [sparse]
> > col8: string of 3 char [sparse]
> > col9: string of 3 char [sparse]
> >
> > I've kept max.reduce.tasks = 16 ..
> >
> > Haven't set MSLABS.. what values do you recommend for my  cluster.
> >
> > > "10k rows in a batch put() not really a good idea."
> > Hmm.. should it be less or more ?
> >
> > > "What's your region size?"
> > I did not set hbase.hregion.max.filesize manually.. please recommend.
> > neither
> > did i pre-create regions..
> >
> > I'm not saying PIG will be a bottleneck.. The Output format /
> > configurations of hbase /hardware can be... need suggestions on the
> same...
> >
> > Can I use HFileOutputFormat in this case? can i get some example
> snippets?
> >
> > Thanks
> > Raj
> >
> > On Thu, Apr 26, 2012 at 7:11 PM, Michel Segel <michael_segel@hotmail.com
> > >wrote:
> >
> > > Ok...
> > > 5 machines...
> > > Total cluster? Is that 5 DN?
> > > Each machine 1quad core, 32gb ram, 7 x600GB not sure what types of
> > drives.
> > >
> > >
> > > so let's assume 1control node running NN, JT, HM, ZK
> > > And 4 DN running DN,TT,RS.
> > >
> > > We don't know your Schema, row size, or network. ( 10GBe, 1GBe,
> 100MBe?)
> > >
> > > We also don't know if you've tuned GC implemented MSLABS ... Etc.
> > >
> > > So 4 hours for 175Million rows? Could be ok.
> > > Write your insert using a java M/R and see how long it takes.
> > >
> > > Nor do we know how many. Slots you have on each box.
> > > 10k rows in a batch put() not really a good idea.
> > > What's your region size?
> > >
> > >
> > > Lots to think about before you can ask if you are doing the right
> thing,
> > > or if PIG is the bottleneck.
> > >
> > >
> > > Sent from a remote device. Please excuse any typos...
> > >
> > > Mike Segel
> > >
> > > On Apr 26, 2012, at 7:09 AM, Rajgopal Vaithiyanathan <
> > raja.fire@gmail.com>
> > > wrote:
> > >
> > > > My bad.
> > > >
> > > > I had used cat /proc/cpuinfo | grep "processor"  | wc -l
> > > > cat /proc/cpuinfo | grep “physical id” | sort | uniq | wc -l   => 4
> > > >
> > > > so its 4 physical cores then!
> > > >
> > > > and free -m gives me this.
> > > >             total       used       free     shared    buffers
> > cached
> > > > Mem:         32174      31382        792          0        123
> >  27339
> > > > -/+ buffers/cache:       3918      28256
> > > > Swap:        24575          0      24575
> > > >
> > > >
> > > >
> > > > On Thu, Apr 26, 2012 at 5:18 PM, Michel Segel <
> > michael_segel@hotmail.com
> > > >wrote:
> > > >
> > > >> 32 cores w 32GB of Ram?
> > > >>
> > > >> Pig isn't fast, but I have to question what you are using for
> > hardware.
> > > >> Who makes a 32 core box?
> > > >> Assuming you mean 16 physical cores.
> > > >>
> > > >> 7 drives? Not enough spindles for the number of cores.
> > > >>
> > > >> Sent from a remote device. Please excuse any typos...
> > > >>
> > > >> Mike Segel
> > > >>
> > > >> On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <
> > > raja.fire@gmail.com>
> > > >> wrote:
> > > >>
> > > >>> Hey all,
> > > >>>
> > > >>> The default - HBaseStorage() takes hell lot of time for puts.
> > > >>>
> > > >>> In a cluster of 5 machines, insertion of 175 Million records took
> > > 4Hours
> > > >> 45
> > > >>> minutes
> > > >>> Question -  Is this good enough ?
> > > >>> each machine has 32 cores and 32GB ram with 7*600GB harddisks.
> > HBASE's
> > > >> heap
> > > >>> has been configured to 8GB.
> > > >>> If the put speed is low, how can i improve them..?
> > > >>>
> > > >>> I tried tweaking the TableOutputFormat by increasing the
> > > WriteBufferSize
> > > >> to
> > > >>> 24MB, and adding the multi put feature (by adding 10,000 puts in
> > > >> ArrayList
> > > >>> and putting it as a batch).  After doing this,  it started throwing
> > > >>>
> > > >>> java.util.concurrent.ExecutionException:
> > > java.net.SocketTimeoutException:
> > > >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> > > exception:
> > > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> > for
> > > >>> channel to be ready for read. ch :
> > > >>> java.nio.channels.SocketChannel[connected
> > > >>> local=/172.21.208.176:41135remote=slave1/
> > > >>> 172.21.208.176:60020]
> > > >>>
> > > >>> Which i assume is because, the clients took too long to put.
> > > >>>
> > > >>> The detailed log is as follows from one of the reduce job is as
> > > follows.
> > > >>>
> > > >>> I've 'censored' some of the details. which i assume is Okay.! :P
> > > >>> 2012-04-23 20:07:12,815 INFO
> org.apache.hadoop.util.NativeCodeLoader:
> > > >>> Loaded the native-hadoop library
> > > >>> 2012-04-23 20:07:13,097 WARN
> > > >>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
> > > >> already
> > > >>> exists!
> > > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011
> > 20:46
> > > >> GMT
> > > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:host.name=*****.*****
> > > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:java.version=1.6.0_22
> > > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:java.vendor=Sun Microsystems Inc.
> > > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
> > > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:java.class.path=****************************
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:java.library.path=**********************
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:java.io.tmpdir=***************************
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:java.compiler=<NA>
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:os.name=Linux
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:os.arch=amd64
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:os.version=2.6.38-8-server
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:user.name=raj
> > > >>>
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:user.home=*********
> > > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > > >>> environment:user.dir=**********************:
> > > >>> 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper:
> > Initiating
> > > >>> client connection, connectString=master:2181 sessionTimeout=180000
> > > >>> watcher=hconnection
> > > >>> 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn:
> Opening
> > > >>> socket connection to server /172.21.208.180:2181
> > > >>> 2012-04-23 20:07:13,823 INFO
> > > >>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The
> > identifier
> > > of
> > > >>> this process is 72909@slave1.slave1
> > > >>> 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn:
> Socket
> > > >>> connection established to master/172.21.208.180:2181, initiating
> > > session
> > > >>> 2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn:
> Session
> > > >>> establishment complete on server master/172.21.208.180:2181,
> > > sessionid =
> > > >>> 0x136dfa124e90015, negotiated timeout = 180000
> > > >>> 2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat:
> > > Created
> > > >>> table instance for index
> > > >>> 2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree:
> > setsid
> > > >>> exited with exit code 0
> > > >>> 2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
> > > >>> ResourceCalculatorPlugin :
> > > >>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
> > > >>> 2012-04-23 20:08:49,852 WARN
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > > >>> Failed all from
> > > >>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> > > >>> hostname=slave1, port=60020
> > > >>> java.util.concurrent.ExecutionException:
> > > java.net.SocketTimeoutException:
> > > >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> > > exception:
> > > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> > for
> > > >>> channel to be ready for read. ch :
> > > >>> java.nio.channels.SocketChannel[connected
> > > >>> local=/172.21.208.176:41135remote=slave1/
> > > >>> 172.21.208.176:60020]
> > > >>>   at
> > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> > > >>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> > > >>>   at
> > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> > > >>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> > > >>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> > > >>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > >>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > >>>   at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > >>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > >>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > >>>   at java.security.AccessController.doPrivileged(Native Method)
> > > >>>   at javax.security.auth.Subject.doAs(Subject.java:416)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> > > >>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > >>> Caused by: java.net.SocketTimeoutException: Call to slave1/
> > > >>> 172.21.208.176:60020 failed on socket timeout exception:
> > > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> > for
> > > >>> channel to be ready for read. ch :
> > > >>> java.nio.channels.SocketChannel[connected
> > > >>> local=/172.21.208.176:41135remote=slave1/
> > > >>> 172.21.208.176:60020]
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> > > >>>   at
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> > > >>>   at $Proxy7.multi(Unknown Source)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> > > >>>   at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > >>>   at java.lang.Thread.run(Thread.java:679)
> > > >>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> > while
> > > >>> waiting for channel to be ready for read. ch :
> > > >>> java.nio.channels.SocketChannel[connected
> > > >>> local=/172.21.208.176:41135remote=slave1/
> > > >>> 172.21.208.176:60020]
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > >>>   at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > >>>   at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > >>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> > > >>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > > >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > > >>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> > > >>> 2012-04-23 20:09:51,018 WARN
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > > >>> Failed all from
> > > >>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> > > >>> hostname=slave1, port=60020
> > > >>> java.util.concurrent.ExecutionException:
> > > java.net.SocketTimeoutException:
> > > >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> > > exception:
> > > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> > for
> > > >>> channel to be ready for read. ch :
> > > >>> java.nio.channels.SocketChannel[connected
> > > >>> local=/172.21.208.176:41150remote=slave1/
> > > >>> 172.21.208.176:60020]
> > > >>>   at
> > java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> > > >>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> > > >>>   at
> > > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> > > >>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> > > >>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> > > >>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > > >>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > > >>>   at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > > >>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > > >>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > > >>>   at java.security.AccessController.doPrivileged(Native Method)
> > > >>>   at javax.security.auth.Subject.doAs(Subject.java:416)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> > > >>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > > >>> Caused by: java.net.SocketTimeoutException: Call to slave1/
> > > >>> 172.21.208.176:60020 failed on socket timeout exception:
> > > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> > for
> > > >>> channel to be ready for read. ch :
> > > >>> java.nio.channels.SocketChannel[connected
> > > >>> local=/172.21.208.176:41150remote=slave1/
> > > >>> 172.21.208.176:60020]
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> > > >>>   at
> > org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> > > >>>   at $Proxy7.multi(Unknown Source)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> > > >>>   at
> > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > > >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > > >>>   at java.lang.Thread.run(Thread.java:679)
> > > >>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> > while
> > > >>> waiting for channel to be ready for read. ch :
> > > >>> java.nio.channels.SocketChannel[connected
> > > >>> local=/172.21.208.176:41150remote=slave1/
> > > >>> 172.21.208.176:60020]
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > > >>>   at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > > >>>   at
> > > >>>
> > >
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > > >>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> > > >>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > > >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > > >>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> > > >>>   at
> > > >>>
> > > >>
> > >
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> > > >>>
> > > >>> --
> > > >>> Thanks and Regards,
> > > >>> Raj
> > > >>
> > >
> >
>



-- 
Thanks and Regards,
Rajgopal Vaithiyanathan.

Re: Best Hbase Storage for PIG

Posted by Raghu Angadi <ra...@apache.org>.
A lot of factors can affect HBase performance.. could even be some hardware
related (slow network, or disk)..

How fast can you scan? does that work well?
You could take jstack of the clients (reducer) and region servers when you
are writing and post them and/or hbase list. This would point to where the
bottleneck is.

Raghu.

On Thu, Apr 26, 2012 at 11:13 PM, Rajgopal Vaithiyanathan <
raja.fire@gmail.com> wrote:

> @doug
> Regarding monotonically increasing keys, I took care by randomizing the
> data order.
> Regarding pre-created regions - did not know i can do that. Thanks.
> But when i looked into the case studys, the section "HBase Region With
> Non-Local Data". Will this be a problem when I pre-create the regions?
>
>
> @michel
> Schema is simple..
> one column family... in which we'll insert a max of 10 columns. 4 columns
> are compulsory. and other 6 cols are sparsely filled.
>
> KEY: a string of 50 Characters
> Col1: int
> Col2: string of 20 characters
> col3: string of 20 characters
> col4 : int
> col5 : int [ sparse ]
> col6: float [sparse]
> col7: string of 3 char [sparse]
> col8: string of 3 char [sparse]
> col9: string of 3 char [sparse]
>
> I've kept max.reduce.tasks = 16 ..
>
> Haven't set MSLABS.. what values do you recommend for my  cluster.
>
> > "10k rows in a batch put() not really a good idea."
> Hmm.. should it be less or more ?
>
> > "What's your region size?"
> I did not set hbase.hregion.max.filesize manually.. please recommend.
> neither
> did i pre-create regions..
>
> I'm not saying PIG will be a bottleneck.. The Output format /
> configurations of hbase /hardware can be... need suggestions on the same...
>
> Can I use HFileOutputFormat in this case? can i get some example snippets?
>
> Thanks
> Raj
>
> On Thu, Apr 26, 2012 at 7:11 PM, Michel Segel <michael_segel@hotmail.com
> >wrote:
>
> > Ok...
> > 5 machines...
> > Total cluster? Is that 5 DN?
> > Each machine 1quad core, 32gb ram, 7 x600GB not sure what types of
> drives.
> >
> >
> > so let's assume 1control node running NN, JT, HM, ZK
> > And 4 DN running DN,TT,RS.
> >
> > We don't know your Schema, row size, or network. ( 10GBe, 1GBe, 100MBe?)
> >
> > We also don't know if you've tuned GC implemented MSLABS ... Etc.
> >
> > So 4 hours for 175Million rows? Could be ok.
> > Write your insert using a java M/R and see how long it takes.
> >
> > Nor do we know how many. Slots you have on each box.
> > 10k rows in a batch put() not really a good idea.
> > What's your region size?
> >
> >
> > Lots to think about before you can ask if you are doing the right thing,
> > or if PIG is the bottleneck.
> >
> >
> > Sent from a remote device. Please excuse any typos...
> >
> > Mike Segel
> >
> > On Apr 26, 2012, at 7:09 AM, Rajgopal Vaithiyanathan <
> raja.fire@gmail.com>
> > wrote:
> >
> > > My bad.
> > >
> > > I had used cat /proc/cpuinfo | grep "processor"  | wc -l
> > > cat /proc/cpuinfo | grep “physical id” | sort | uniq | wc -l   => 4
> > >
> > > so its 4 physical cores then!
> > >
> > > and free -m gives me this.
> > >             total       used       free     shared    buffers
> cached
> > > Mem:         32174      31382        792          0        123
>  27339
> > > -/+ buffers/cache:       3918      28256
> > > Swap:        24575          0      24575
> > >
> > >
> > >
> > > On Thu, Apr 26, 2012 at 5:18 PM, Michel Segel <
> michael_segel@hotmail.com
> > >wrote:
> > >
> > >> 32 cores w 32GB of Ram?
> > >>
> > >> Pig isn't fast, but I have to question what you are using for
> hardware.
> > >> Who makes a 32 core box?
> > >> Assuming you mean 16 physical cores.
> > >>
> > >> 7 drives? Not enough spindles for the number of cores.
> > >>
> > >> Sent from a remote device. Please excuse any typos...
> > >>
> > >> Mike Segel
> > >>
> > >> On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <
> > raja.fire@gmail.com>
> > >> wrote:
> > >>
> > >>> Hey all,
> > >>>
> > >>> The default - HBaseStorage() takes hell lot of time for puts.
> > >>>
> > >>> In a cluster of 5 machines, insertion of 175 Million records took
> > 4Hours
> > >> 45
> > >>> minutes
> > >>> Question -  Is this good enough ?
> > >>> each machine has 32 cores and 32GB ram with 7*600GB harddisks.
> HBASE's
> > >> heap
> > >>> has been configured to 8GB.
> > >>> If the put speed is low, how can i improve them..?
> > >>>
> > >>> I tried tweaking the TableOutputFormat by increasing the
> > WriteBufferSize
> > >> to
> > >>> 24MB, and adding the multi put feature (by adding 10,000 puts in
> > >> ArrayList
> > >>> and putting it as a batch).  After doing this,  it started throwing
> > >>>
> > >>> java.util.concurrent.ExecutionException:
> > java.net.SocketTimeoutException:
> > >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> > exception:
> > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> for
> > >>> channel to be ready for read. ch :
> > >>> java.nio.channels.SocketChannel[connected
> > >>> local=/172.21.208.176:41135remote=slave1/
> > >>> 172.21.208.176:60020]
> > >>>
> > >>> Which i assume is because, the clients took too long to put.
> > >>>
> > >>> The detailed log is as follows from one of the reduce job is as
> > follows.
> > >>>
> > >>> I've 'censored' some of the details. which i assume is Okay.! :P
> > >>> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
> > >>> Loaded the native-hadoop library
> > >>> 2012-04-23 20:07:13,097 WARN
> > >>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
> > >> already
> > >>> exists!
> > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011
> 20:46
> > >> GMT
> > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:host.name=*****.*****
> > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:java.version=1.6.0_22
> > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:java.vendor=Sun Microsystems Inc.
> > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
> > >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:java.class.path=****************************
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:java.library.path=**********************
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:java.io.tmpdir=***************************
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:java.compiler=<NA>
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:os.name=Linux
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:os.arch=amd64
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:os.version=2.6.38-8-server
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:user.name=raj
> > >>>
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:user.home=*********
> > >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > >>> environment:user.dir=**********************:
> > >>> 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper:
> Initiating
> > >>> client connection, connectString=master:2181 sessionTimeout=180000
> > >>> watcher=hconnection
> > >>> 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
> > >>> socket connection to server /172.21.208.180:2181
> > >>> 2012-04-23 20:07:13,823 INFO
> > >>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The
> identifier
> > of
> > >>> this process is 72909@slave1.slave1
> > >>> 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
> > >>> connection established to master/172.21.208.180:2181, initiating
> > session
> > >>> 2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
> > >>> establishment complete on server master/172.21.208.180:2181,
> > sessionid =
> > >>> 0x136dfa124e90015, negotiated timeout = 180000
> > >>> 2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat:
> > Created
> > >>> table instance for index
> > >>> 2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree:
> setsid
> > >>> exited with exit code 0
> > >>> 2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
> > >>> ResourceCalculatorPlugin :
> > >>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
> > >>> 2012-04-23 20:08:49,852 WARN
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > >>> Failed all from
> > >>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> > >>> hostname=slave1, port=60020
> > >>> java.util.concurrent.ExecutionException:
> > java.net.SocketTimeoutException:
> > >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> > exception:
> > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> for
> > >>> channel to be ready for read. ch :
> > >>> java.nio.channels.SocketChannel[connected
> > >>> local=/172.21.208.176:41135remote=slave1/
> > >>> 172.21.208.176:60020]
> > >>>   at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> > >>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> > >>>   at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> > >>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> > >>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> > >>>   at
> > >>>
> > >>
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> > >>>   at
> > >>>
> > >>
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> > >>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > >>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > >>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > >>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > >>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > >>>   at java.security.AccessController.doPrivileged(Native Method)
> > >>>   at javax.security.auth.Subject.doAs(Subject.java:416)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> > >>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > >>> Caused by: java.net.SocketTimeoutException: Call to slave1/
> > >>> 172.21.208.176:60020 failed on socket timeout exception:
> > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> for
> > >>> channel to be ready for read. ch :
> > >>> java.nio.channels.SocketChannel[connected
> > >>> local=/172.21.208.176:41135remote=slave1/
> > >>> 172.21.208.176:60020]
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> > >>>   at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> > >>>   at $Proxy7.multi(Unknown Source)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> > >>>   at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>   at
> > >>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>   at
> > >>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>   at java.lang.Thread.run(Thread.java:679)
> > >>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> while
> > >>> waiting for channel to be ready for read. ch :
> > >>> java.nio.channels.SocketChannel[connected
> > >>> local=/172.21.208.176:41135remote=slave1/
> > >>> 172.21.208.176:60020]
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > >>>   at
> > >>>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > >>>   at
> > >>>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > >>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> > >>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > >>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> > >>> 2012-04-23 20:09:51,018 WARN
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > >>> Failed all from
> > >>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> > >>> hostname=slave1, port=60020
> > >>> java.util.concurrent.ExecutionException:
> > java.net.SocketTimeoutException:
> > >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> > exception:
> > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> for
> > >>> channel to be ready for read. ch :
> > >>> java.nio.channels.SocketChannel[connected
> > >>> local=/172.21.208.176:41150remote=slave1/
> > >>> 172.21.208.176:60020]
> > >>>   at
> java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> > >>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> > >>>   at
> > org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> > >>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> > >>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> > >>>   at
> > >>>
> > >>
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> > >>>   at
> > >>>
> > >>
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> > >>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> > >>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> > >>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> > >>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> > >>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> > >>>   at java.security.AccessController.doPrivileged(Native Method)
> > >>>   at javax.security.auth.Subject.doAs(Subject.java:416)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> > >>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > >>> Caused by: java.net.SocketTimeoutException: Call to slave1/
> > >>> 172.21.208.176:60020 failed on socket timeout exception:
> > >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting
> for
> > >>> channel to be ready for read. ch :
> > >>> java.nio.channels.SocketChannel[connected
> > >>> local=/172.21.208.176:41150remote=slave1/
> > >>> 172.21.208.176:60020]
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> > >>>   at
> org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> > >>>   at $Proxy7.multi(Unknown Source)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> > >>>   at
> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> > >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> > >>>   at
> > >>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> > >>>   at
> > >>>
> > >>
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> > >>>   at java.lang.Thread.run(Thread.java:679)
> > >>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout
> while
> > >>> waiting for channel to be ready for read. ch :
> > >>> java.nio.channels.SocketChannel[connected
> > >>> local=/172.21.208.176:41150remote=slave1/
> > >>> 172.21.208.176:60020]
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> > >>>   at
> > >>>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> > >>>   at
> > >>>
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> > >>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> > >>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> > >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> > >>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> > >>>   at
> > >>>
> > >>
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> > >>>
> > >>> --
> > >>> Thanks and Regards,
> > >>> Raj
> > >>
> >
>

Re: Best Hbase Storage for PIG

Posted by Rajgopal Vaithiyanathan <ra...@gmail.com>.
@doug
Regarding monotonically increasing keys, I took care by randomizing the
data order.
Regarding pre-created regions - did not know i can do that. Thanks.
But when i looked into the case studys, the section "HBase Region With
Non-Local Data". Will this be a problem when I pre-create the regions?


@michel
Schema is simple..
one column family... in which we'll insert a max of 10 columns. 4 columns
are compulsory. and other 6 cols are sparsely filled.

KEY: a string of 50 Characters
Col1: int
Col2: string of 20 characters
col3: string of 20 characters
col4 : int
col5 : int [ sparse ]
col6: float [sparse]
col7: string of 3 char [sparse]
col8: string of 3 char [sparse]
col9: string of 3 char [sparse]

I've kept max.reduce.tasks = 16 ..

Haven't set MSLABS.. what values do you recommend for my  cluster.

> "10k rows in a batch put() not really a good idea."
Hmm.. should it be less or more ?

> "What's your region size?"
I did not set hbase.hregion.max.filesize manually.. please recommend. neither
did i pre-create regions..

I'm not saying PIG will be a bottleneck.. The Output format /
configurations of hbase /hardware can be... need suggestions on the same...

Can I use HFileOutputFormat in this case? can i get some example snippets?

Thanks
Raj

On Thu, Apr 26, 2012 at 7:11 PM, Michel Segel <mi...@hotmail.com>wrote:

> Ok...
> 5 machines...
> Total cluster? Is that 5 DN?
> Each machine 1quad core, 32gb ram, 7 x600GB not sure what types of drives.
>
>
> so let's assume 1control node running NN, JT, HM, ZK
> And 4 DN running DN,TT,RS.
>
> We don't know your Schema, row size, or network. ( 10GBe, 1GBe, 100MBe?)
>
> We also don't know if you've tuned GC implemented MSLABS ... Etc.
>
> So 4 hours for 175Million rows? Could be ok.
> Write your insert using a java M/R and see how long it takes.
>
> Nor do we know how many. Slots you have on each box.
> 10k rows in a batch put() not really a good idea.
> What's your region size?
>
>
> Lots to think about before you can ask if you are doing the right thing,
> or if PIG is the bottleneck.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 26, 2012, at 7:09 AM, Rajgopal Vaithiyanathan <ra...@gmail.com>
> wrote:
>
> > My bad.
> >
> > I had used cat /proc/cpuinfo | grep "processor"  | wc -l
> > cat /proc/cpuinfo | grep “physical id” | sort | uniq | wc -l   => 4
> >
> > so its 4 physical cores then!
> >
> > and free -m gives me this.
> >             total       used       free     shared    buffers     cached
> > Mem:         32174      31382        792          0        123      27339
> > -/+ buffers/cache:       3918      28256
> > Swap:        24575          0      24575
> >
> >
> >
> > On Thu, Apr 26, 2012 at 5:18 PM, Michel Segel <michael_segel@hotmail.com
> >wrote:
> >
> >> 32 cores w 32GB of Ram?
> >>
> >> Pig isn't fast, but I have to question what you are using for hardware.
> >> Who makes a 32 core box?
> >> Assuming you mean 16 physical cores.
> >>
> >> 7 drives? Not enough spindles for the number of cores.
> >>
> >> Sent from a remote device. Please excuse any typos...
> >>
> >> Mike Segel
> >>
> >> On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <
> raja.fire@gmail.com>
> >> wrote:
> >>
> >>> Hey all,
> >>>
> >>> The default - HBaseStorage() takes hell lot of time for puts.
> >>>
> >>> In a cluster of 5 machines, insertion of 175 Million records took
> 4Hours
> >> 45
> >>> minutes
> >>> Question -  Is this good enough ?
> >>> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
> >> heap
> >>> has been configured to 8GB.
> >>> If the put speed is low, how can i improve them..?
> >>>
> >>> I tried tweaking the TableOutputFormat by increasing the
> WriteBufferSize
> >> to
> >>> 24MB, and adding the multi put feature (by adding 10,000 puts in
> >> ArrayList
> >>> and putting it as a batch).  After doing this,  it started throwing
> >>>
> >>> java.util.concurrent.ExecutionException:
> java.net.SocketTimeoutException:
> >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> exception:
> >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> >>> channel to be ready for read. ch :
> >>> java.nio.channels.SocketChannel[connected
> >>> local=/172.21.208.176:41135remote=slave1/
> >>> 172.21.208.176:60020]
> >>>
> >>> Which i assume is because, the clients took too long to put.
> >>>
> >>> The detailed log is as follows from one of the reduce job is as
> follows.
> >>>
> >>> I've 'censored' some of the details. which i assume is Okay.! :P
> >>> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
> >>> Loaded the native-hadoop library
> >>> 2012-04-23 20:07:13,097 WARN
> >>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
> >> already
> >>> exists!
> >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46
> >> GMT
> >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:host.name=*****.*****
> >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:java.version=1.6.0_22
> >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:java.vendor=Sun Microsystems Inc.
> >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
> >>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:java.class.path=****************************
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:java.library.path=**********************
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:java.io.tmpdir=***************************
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:java.compiler=<NA>
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:os.name=Linux
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:os.arch=amd64
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:os.version=2.6.38-8-server
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:user.name=raj
> >>>
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:user.home=*********
> >>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> >>> environment:user.dir=**********************:
> >>> 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
> >>> client connection, connectString=master:2181 sessionTimeout=180000
> >>> watcher=hconnection
> >>> 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
> >>> socket connection to server /172.21.208.180:2181
> >>> 2012-04-23 20:07:13,823 INFO
> >>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier
> of
> >>> this process is 72909@slave1.slave1
> >>> 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
> >>> connection established to master/172.21.208.180:2181, initiating
> session
> >>> 2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
> >>> establishment complete on server master/172.21.208.180:2181,
> sessionid =
> >>> 0x136dfa124e90015, negotiated timeout = 180000
> >>> 2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat:
> Created
> >>> table instance for index
> >>> 2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
> >>> exited with exit code 0
> >>> 2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
> >>> ResourceCalculatorPlugin :
> >>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
> >>> 2012-04-23 20:08:49,852 WARN
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> >>> Failed all from
> >>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> >>> hostname=slave1, port=60020
> >>> java.util.concurrent.ExecutionException:
> java.net.SocketTimeoutException:
> >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> exception:
> >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> >>> channel to be ready for read. ch :
> >>> java.nio.channels.SocketChannel[connected
> >>> local=/172.21.208.176:41135remote=slave1/
> >>> 172.21.208.176:60020]
> >>>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> >>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> >>>   at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> >>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> >>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> >>>   at
> >>>
> >>
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> >>>   at
> >>>
> >>
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> >>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> >>>   at
> >>>
> >>
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> >>>   at
> >>>
> >>
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >>>   at java.security.AccessController.doPrivileged(Native Method)
> >>>   at javax.security.auth.Subject.doAs(Subject.java:416)
> >>>   at
> >>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> >>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >>> Caused by: java.net.SocketTimeoutException: Call to slave1/
> >>> 172.21.208.176:60020 failed on socket timeout exception:
> >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> >>> channel to be ready for read. ch :
> >>> java.nio.channels.SocketChannel[connected
> >>> local=/172.21.208.176:41135remote=slave1/
> >>> 172.21.208.176:60020]
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> >>>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> >>>   at $Proxy7.multi(Unknown Source)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> >>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>   at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>   at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>   at java.lang.Thread.run(Thread.java:679)
> >>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> >>> waiting for channel to be ready for read. ch :
> >>> java.nio.channels.SocketChannel[connected
> >>> local=/172.21.208.176:41135remote=slave1/
> >>> 172.21.208.176:60020]
> >>>   at
> >>>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >>>   at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >>>   at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> >>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> >>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> >>> 2012-04-23 20:09:51,018 WARN
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> >>> Failed all from
> >>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> >>> hostname=slave1, port=60020
> >>> java.util.concurrent.ExecutionException:
> java.net.SocketTimeoutException:
> >>> Call to slave1/172.21.208.176:60020 failed on socket timeout
> exception:
> >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> >>> channel to be ready for read. ch :
> >>> java.nio.channels.SocketChannel[connected
> >>> local=/172.21.208.176:41150remote=slave1/
> >>> 172.21.208.176:60020]
> >>>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> >>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> >>>   at
> org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> >>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> >>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> >>>   at
> >>>
> >>
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> >>>   at
> >>>
> >>
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> >>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> >>>   at
> >>>
> >>
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> >>>   at
> >>>
> >>
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> >>>   at
> >>>
> >>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >>>   at java.security.AccessController.doPrivileged(Native Method)
> >>>   at javax.security.auth.Subject.doAs(Subject.java:416)
> >>>   at
> >>>
> >>
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> >>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
> >>> Caused by: java.net.SocketTimeoutException: Call to slave1/
> >>> 172.21.208.176:60020 failed on socket timeout exception:
> >>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> >>> channel to be ready for read. ch :
> >>> java.nio.channels.SocketChannel[connected
> >>> local=/172.21.208.176:41150remote=slave1/
> >>> 172.21.208.176:60020]
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> >>>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> >>>   at $Proxy7.multi(Unknown Source)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> >>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >>>   at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >>>   at
> >>>
> >>
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >>>   at java.lang.Thread.run(Thread.java:679)
> >>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> >>> waiting for channel to be ready for read. ch :
> >>> java.nio.channels.SocketChannel[connected
> >>> local=/172.21.208.176:41150remote=slave1/
> >>> 172.21.208.176:60020]
> >>>   at
> >>>
> >>
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >>>   at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >>>   at
> >>>
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> >>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> >>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> >>>   at
> >>>
> >>
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> >>>
> >>> --
> >>> Thanks and Regards,
> >>> Raj
> >>
>

Re: Best Hbase Storage for PIG

Posted by Michel Segel <mi...@hotmail.com>.
Ok...
5 machines...
Total cluster? Is that 5 DN?
Each machine 1quad core, 32gb ram, 7 x600GB not sure what types of drives. 


so let's assume 1control node running NN, JT, HM, ZK 
And 4 DN running DN,TT,RS.

We don't know your Schema, row size, or network. ( 10GBe, 1GBe, 100MBe?)

We also don't know if you've tuned GC implemented MSLABS ... Etc.

So 4 hours for 175Million rows? Could be ok.
Write your insert using a java M/R and see how long it takes.

Nor do we know how many. Slots you have on each box.
10k rows in a batch put() not really a good idea.
What's your region size?


Lots to think about before you can ask if you are doing the right thing, or if PIG is the bottleneck.


Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 26, 2012, at 7:09 AM, Rajgopal Vaithiyanathan <ra...@gmail.com> wrote:

> My bad.
> 
> I had used cat /proc/cpuinfo | grep "processor"  | wc -l
> cat /proc/cpuinfo | grep “physical id” | sort | uniq | wc -l   => 4
> 
> so its 4 physical cores then!
> 
> and free -m gives me this.
>             total       used       free     shared    buffers     cached
> Mem:         32174      31382        792          0        123      27339
> -/+ buffers/cache:       3918      28256
> Swap:        24575          0      24575
> 
> 
> 
> On Thu, Apr 26, 2012 at 5:18 PM, Michel Segel <mi...@hotmail.com>wrote:
> 
>> 32 cores w 32GB of Ram?
>> 
>> Pig isn't fast, but I have to question what you are using for hardware.
>> Who makes a 32 core box?
>> Assuming you mean 16 physical cores.
>> 
>> 7 drives? Not enough spindles for the number of cores.
>> 
>> Sent from a remote device. Please excuse any typos...
>> 
>> Mike Segel
>> 
>> On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <ra...@gmail.com>
>> wrote:
>> 
>>> Hey all,
>>> 
>>> The default - HBaseStorage() takes hell lot of time for puts.
>>> 
>>> In a cluster of 5 machines, insertion of 175 Million records took 4Hours
>> 45
>>> minutes
>>> Question -  Is this good enough ?
>>> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
>> heap
>>> has been configured to 8GB.
>>> If the put speed is low, how can i improve them..?
>>> 
>>> I tried tweaking the TableOutputFormat by increasing the WriteBufferSize
>> to
>>> 24MB, and adding the multi put feature (by adding 10,000 puts in
>> ArrayList
>>> and putting it as a batch).  After doing this,  it started throwing
>>> 
>>> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>>> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41135remote=slave1/
>>> 172.21.208.176:60020]
>>> 
>>> Which i assume is because, the clients took too long to put.
>>> 
>>> The detailed log is as follows from one of the reduce job is as follows.
>>> 
>>> I've 'censored' some of the details. which i assume is Okay.! :P
>>> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
>>> Loaded the native-hadoop library
>>> 2012-04-23 20:07:13,097 WARN
>>> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
>> already
>>> exists!
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46
>> GMT
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:host.name=*****.*****
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.version=1.6.0_22
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.vendor=Sun Microsystems Inc.
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
>>> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.class.path=****************************
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.library.path=**********************
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.io.tmpdir=***************************
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:java.compiler=<NA>
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.name=Linux
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.arch=amd64
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:os.version=2.6.38-8-server
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.name=raj
>>> 
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.home=*********
>>> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>>> environment:user.dir=**********************:
>>> 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
>>> client connection, connectString=master:2181 sessionTimeout=180000
>>> watcher=hconnection
>>> 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
>>> socket connection to server /172.21.208.180:2181
>>> 2012-04-23 20:07:13,823 INFO
>>> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
>>> this process is 72909@slave1.slave1
>>> 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
>>> connection established to master/172.21.208.180:2181, initiating session
>>> 2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
>>> establishment complete on server master/172.21.208.180:2181, sessionid =
>>> 0x136dfa124e90015, negotiated timeout = 180000
>>> 2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
>>> table instance for index
>>> 2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
>>> exited with exit code 0
>>> 2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
>>> ResourceCalculatorPlugin :
>>> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
>>> 2012-04-23 20:08:49,852 WARN
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>>> Failed all from
>>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
>>> hostname=slave1, port=60020
>>> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>>> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41135remote=slave1/
>>> 172.21.208.176:60020]
>>>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>>>   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>>>   at
>>> 
>> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
>>>   at
>>> 
>> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
>>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>>   at
>>> 
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>>>   at
>>> 
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>   at javax.security.auth.Subject.doAs(Subject.java:416)
>>>   at
>>> 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> Caused by: java.net.SocketTimeoutException: Call to slave1/
>>> 172.21.208.176:60020 failed on socket timeout exception:
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41135remote=slave1/
>>> 172.21.208.176:60020]
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
>>>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>>>   at $Proxy7.multi(Unknown Source)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
>>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>   at
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>   at
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>   at java.lang.Thread.run(Thread.java:679)
>>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>>> waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41135remote=slave1/
>>> 172.21.208.176:60020]
>>>   at
>>> 
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>   at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>   at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
>>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
>>> 2012-04-23 20:09:51,018 WARN
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
>>> Failed all from
>>> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
>>> hostname=slave1, port=60020
>>> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>>> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41150remote=slave1/
>>> 172.21.208.176:60020]
>>>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>>>   at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>>>   at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>>>   at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>>>   at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>>>   at
>>> 
>> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
>>>   at
>>> 
>> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
>>>   at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>>>   at
>>> 
>> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>>>   at
>>> 
>> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>>>   at
>>> 
>> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>>>   at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>>>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>>>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>>>   at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>>   at java.security.AccessController.doPrivileged(Native Method)
>>>   at javax.security.auth.Subject.doAs(Subject.java:416)
>>>   at
>>> 
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>>>   at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>> Caused by: java.net.SocketTimeoutException: Call to slave1/
>>> 172.21.208.176:60020 failed on socket timeout exception:
>>> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>>> channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41150remote=slave1/
>>> 172.21.208.176:60020]
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
>>>   at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>>>   at $Proxy7.multi(Unknown Source)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
>>>   at
>>> 
>> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
>>>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>>>   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>>>   at
>>> 
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>>>   at
>>> 
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>>>   at java.lang.Thread.run(Thread.java:679)
>>> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>>> waiting for channel to be ready for read. ch :
>>> java.nio.channels.SocketChannel[connected
>>> local=/172.21.208.176:41150remote=slave1/
>>> 172.21.208.176:60020]
>>>   at
>>> 
>> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>>>   at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>>>   at
>>> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>>>   at java.io.FilterInputStream.read(FilterInputStream.java:133)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
>>>   at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>>>   at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>>>   at java.io.DataInputStream.readInt(DataInputStream.java:387)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
>>>   at
>>> 
>> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
>>> 
>>> --
>>> Thanks and Regards,
>>> Raj
>> 

Re: Best Hbase Storage for PIG

Posted by Rajgopal Vaithiyanathan <ra...@gmail.com>.
My bad.

I had used cat /proc/cpuinfo | grep "processor"  | wc -l
cat /proc/cpuinfo | grep “physical id” | sort | uniq | wc -l   => 4

so its 4 physical cores then!

and free -m gives me this.
             total       used       free     shared    buffers     cached
Mem:         32174      31382        792          0        123      27339
-/+ buffers/cache:       3918      28256
Swap:        24575          0      24575



On Thu, Apr 26, 2012 at 5:18 PM, Michel Segel <mi...@hotmail.com>wrote:

> 32 cores w 32GB of Ram?
>
> Pig isn't fast, but I have to question what you are using for hardware.
> Who makes a 32 core box?
> Assuming you mean 16 physical cores.
>
> 7 drives? Not enough spindles for the number of cores.
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <ra...@gmail.com>
> wrote:
>
> > Hey all,
> >
> > The default - HBaseStorage() takes hell lot of time for puts.
> >
> > In a cluster of 5 machines, insertion of 175 Million records took 4Hours
> 45
> > minutes
> > Question -  Is this good enough ?
> > each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
> heap
> > has been configured to 8GB.
> > If the put speed is low, how can i improve them..?
> >
> > I tried tweaking the TableOutputFormat by increasing the WriteBufferSize
> to
> > 24MB, and adding the multi put feature (by adding 10,000 puts in
> ArrayList
> > and putting it as a batch).  After doing this,  it started throwing
> >
> > java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> > Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/172.21.208.176:41135remote=slave1/
> > 172.21.208.176:60020]
> >
> > Which i assume is because, the clients took too long to put.
> >
> > The detailed log is as follows from one of the reduce job is as follows.
> >
> > I've 'censored' some of the details. which i assume is Okay.! :P
> > 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
> > Loaded the native-hadoop library
> > 2012-04-23 20:07:13,097 WARN
> > org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi
> already
> > exists!
> > 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46
> GMT
> > 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:host.name=*****.*****
> > 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:java.version=1.6.0_22
> > 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:java.vendor=Sun Microsystems Inc.
> > 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
> > 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:java.class.path=****************************
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:java.library.path=**********************
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:java.io.tmpdir=***************************
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:java.compiler=<NA>
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:os.name=Linux
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:os.arch=amd64
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:os.version=2.6.38-8-server
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:user.name=raj
> >
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:user.home=*********
> > 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> > environment:user.dir=**********************:
> > 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
> > client connection, connectString=master:2181 sessionTimeout=180000
> > watcher=hconnection
> > 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
> > socket connection to server /172.21.208.180:2181
> > 2012-04-23 20:07:13,823 INFO
> > org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
> > this process is 72909@slave1.slave1
> > 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
> > connection established to master/172.21.208.180:2181, initiating session
> > 2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
> > establishment complete on server master/172.21.208.180:2181, sessionid =
> > 0x136dfa124e90015, negotiated timeout = 180000
> > 2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
> > table instance for index
> > 2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
> > exited with exit code 0
> > 2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
> > ResourceCalculatorPlugin :
> > org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
> > 2012-04-23 20:08:49,852 WARN
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > Failed all from
> > region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> > hostname=slave1, port=60020
> > java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> > Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/172.21.208.176:41135remote=slave1/
> > 172.21.208.176:60020]
> >    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> >    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> >    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> >    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> >    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> >    at
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> >    at
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> >    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> >    at
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:416)
> >    at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: java.net.SocketTimeoutException: Call to slave1/
> > 172.21.208.176:60020 failed on socket timeout exception:
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/172.21.208.176:41135remote=slave1/
> > 172.21.208.176:60020]
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> >    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> >    at
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> >    at $Proxy7.multi(Unknown Source)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> >    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >    at java.lang.Thread.run(Thread.java:679)
> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/172.21.208.176:41135remote=slave1/
> > 172.21.208.176:60020]
> >    at
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >    at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >    at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >    at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> >    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> >    at java.io.DataInputStream.readInt(DataInputStream.java:387)
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> > 2012-04-23 20:09:51,018 WARN
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> > Failed all from
> > region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> > hostname=slave1, port=60020
> > java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> > Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/172.21.208.176:41150remote=slave1/
> > 172.21.208.176:60020]
> >    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
> >    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
> >    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
> >    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
> >    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
> >    at
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
> >    at
> >
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
> >    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
> >    at
> >
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
> >    at
> >
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
> >    at
> >
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
> >    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
> >    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
> >    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
> >    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
> >    at java.security.AccessController.doPrivileged(Native Method)
> >    at javax.security.auth.Subject.doAs(Subject.java:416)
> >    at
> >
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
> >    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> > Caused by: java.net.SocketTimeoutException: Call to slave1/
> > 172.21.208.176:60020 failed on socket timeout exception:
> > java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> > channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/172.21.208.176:41150remote=slave1/
> > 172.21.208.176:60020]
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
> >    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
> >    at
> >
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
> >    at $Proxy7.multi(Unknown Source)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
> >    at
> >
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
> >    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
> >    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> >    at
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> >    at java.lang.Thread.run(Thread.java:679)
> > Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> > waiting for channel to be ready for read. ch :
> > java.nio.channels.SocketChannel[connected
> > local=/172.21.208.176:41150remote=slave1/
> > 172.21.208.176:60020]
> >    at
> >
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
> >    at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
> >    at
> > org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
> >    at java.io.FilterInputStream.read(FilterInputStream.java:133)
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
> >    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
> >    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
> >    at java.io.DataInputStream.readInt(DataInputStream.java:387)
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
> >    at
> >
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> >
> > --
> > Thanks and Regards,
> > Raj
>

Re: Best Hbase Storage for PIG

Posted by Michel Segel <mi...@hotmail.com>.
32 cores w 32GB of Ram?

Pig isn't fast, but I have to question what you are using for hardware.
Who makes a 32 core box?
Assuming you mean 16 physical cores.

7 drives? Not enough spindles for the number of cores.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <ra...@gmail.com> wrote:

> Hey all,
> 
> The default - HBaseStorage() takes hell lot of time for puts.
> 
> In a cluster of 5 machines, insertion of 175 Million records took 4Hours 45
> minutes
> Question -  Is this good enough ?
> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's heap
> has been configured to 8GB.
> If the put speed is low, how can i improve them..?
> 
> I tried tweaking the TableOutputFormat by increasing the WriteBufferSize to
> 24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
> and putting it as a batch).  After doing this,  it started throwing
> 
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
> 
> Which i assume is because, the clients took too long to put.
> 
> The detailed log is as follows from one of the reduce job is as follows.
> 
> I've 'censored' some of the details. which i assume is Okay.! :P
> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
> Loaded the native-hadoop library
> 2012-04-23 20:07:13,097 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
> exists!
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:host.name=*****.*****
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_22
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.class.path=****************************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.library.path=**********************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=***************************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.version=2.6.38-8-server
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.name=raj
> 
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=*********
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.dir=**********************:
> 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=master:2181 sessionTimeout=180000
> watcher=hconnection
> 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server /172.21.208.180:2181
> 2012-04-23 20:07:13,823 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
> this process is 72909@slave1.slave1
> 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to master/172.21.208.180:2181, initiating session
> 2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server master/172.21.208.180:2181, sessionid =
> 0x136dfa124e90015, negotiated timeout = 180000
> 2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
> table instance for index
> 2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
> exited with exit code 0
> 2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
> 2012-04-23 20:08:49,852 WARN
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Failed all from
> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> hostname=slave1, port=60020
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>    at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.net.SocketTimeoutException: Call to slave1/
> 172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>    at java.lang.Thread.run(Thread.java:679)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> 2012-04-23 20:09:51,018 WARN
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Failed all from
> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> hostname=slave1, port=60020
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41150remote=slave1/
> 172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>    at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.net.SocketTimeoutException: Call to slave1/
> 172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41150remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>    at java.lang.Thread.run(Thread.java:679)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41150remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> 
> -- 
> Thanks and Regards,
> Raj

Re: Best Hbase Storage for PIG

Posted by Subir S <su...@gmail.com>.
Could it be that you could use Completebulkload and see if that
works....That must be faster...than HBaseStorage.....you could pre-split
using

export HADOOP_CLASSPATH=`hbase classpath`;hbase
org.apache.hadoop.hbase.util.RegionSplitter -c 10 '<table_name>' -f <cf
name>

On Sat, Apr 28, 2012 at 8:46 PM, M. C. Srivas <mc...@gmail.com> wrote:

> On Thu, Apr 26, 2012 at 4:38 AM, Rajgopal Vaithiyanathan <
> raja.fire@gmail.com> wrote:
>
> > Hey all,
> >
> > The default - HBaseStorage() takes hell lot of time for puts.
> >
> > In a cluster of 5 machines, insertion of 175 Million records took 4Hours
> 45
> > minutes
> > Question -  Is this good enough ?
> > each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
> heap
> > has been configured to 8GB.
> > If the put speed is low, how can i improve them..?
> >
>
> Raj, how big is each record?
>
>
>
> >
> > I tried tweaking the TableOutputFormat by increasing the WriteBufferSize
> to
> > 24MB, and adding the multi put feature (by adding 10,000 puts in
> ArrayList
> > and putting it as a batch).  After doing this,  it started throwing
> >
> >
>

Re: Best Hbase Storage for PIG

Posted by "M. C. Srivas" <mc...@gmail.com>.
On Thu, Apr 26, 2012 at 4:38 AM, Rajgopal Vaithiyanathan <
raja.fire@gmail.com> wrote:

> Hey all,
>
> The default - HBaseStorage() takes hell lot of time for puts.
>
> In a cluster of 5 machines, insertion of 175 Million records took 4Hours 45
> minutes
> Question -  Is this good enough ?
> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's heap
> has been configured to 8GB.
> If the put speed is low, how can i improve them..?
>

Raj, how big is each record?



>
> I tried tweaking the TableOutputFormat by increasing the WriteBufferSize to
> 24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
> and putting it as a batch).  After doing this,  it started throwing
>
>

Re: Best Hbase Storage for PIG

Posted by Michel Segel <mi...@hotmail.com>.
32 cores w 32GB of Ram?

Pig isn't fast, but I have to question what you are using for hardware.
Who makes a 32 core box?
Assuming you mean 16 physical cores.

7 drives? Not enough spindles for the number of cores.

Sent from a remote device. Please excuse any typos...

Mike Segel

On Apr 26, 2012, at 6:38 AM, Rajgopal Vaithiyanathan <ra...@gmail.com> wrote:

> Hey all,
> 
> The default - HBaseStorage() takes hell lot of time for puts.
> 
> In a cluster of 5 machines, insertion of 175 Million records took 4Hours 45
> minutes
> Question -  Is this good enough ?
> each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's heap
> has been configured to 8GB.
> If the put speed is low, how can i improve them..?
> 
> I tried tweaking the TableOutputFormat by increasing the WriteBufferSize to
> 24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
> and putting it as a batch).  After doing this,  it started throwing
> 
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
> 
> Which i assume is because, the clients took too long to put.
> 
> The detailed log is as follows from one of the reduce job is as follows.
> 
> I've 'censored' some of the details. which i assume is Okay.! :P
> 2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
> Loaded the native-hadoop library
> 2012-04-23 20:07:13,097 WARN
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
> exists!
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:host.name=*****.*****
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.version=1.6.0_22
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.vendor=Sun Microsystems Inc.
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
> 2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.class.path=****************************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.library.path=**********************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.io.tmpdir=***************************
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:java.compiler=<NA>
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.name=Linux
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.arch=amd64
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:os.version=2.6.38-8-server
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.name=raj
> 
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.home=*********
> 2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
> environment:user.dir=**********************:
> 2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
> client connection, connectString=master:2181 sessionTimeout=180000
> watcher=hconnection
> 2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
> socket connection to server /172.21.208.180:2181
> 2012-04-23 20:07:13,823 INFO
> org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
> this process is 72909@slave1.slave1
> 2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
> connection established to master/172.21.208.180:2181, initiating session
> 2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
> establishment complete on server master/172.21.208.180:2181, sessionid =
> 0x136dfa124e90015, negotiated timeout = 180000
> 2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
> table instance for index
> 2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
> exited with exit code 0
> 2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorPlugin :
> org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
> 2012-04-23 20:08:49,852 WARN
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Failed all from
> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> hostname=slave1, port=60020
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>    at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.net.SocketTimeoutException: Call to slave1/
> 172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>    at java.lang.Thread.run(Thread.java:679)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41135remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> 2012-04-23 20:09:51,018 WARN
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation:
> Failed all from
> region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
> hostname=slave1, port=60020
> java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
> Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41150remote=slave1/
> 172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1557)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:142)
>    at
> com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableOutputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
> org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
>    at
> org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:269)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:262)
>    at
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
> Caused by: java.net.SocketTimeoutException: Call to slave1/
> 172.21.208.176:60020 failed on socket timeout exception:
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for
> channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41150remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
> org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1386)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3$1.call(HConnectionManager.java:1384)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1383)
>    at
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
>    at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
>    at java.lang.Thread.run(Thread.java:679)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
> waiting for channel to be ready for read. ch :
> java.nio.channels.SocketChannel[connected
> local=/172.21.208.176:41150remote=slave1/
> 172.21.208.176:60020]
>    at
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:571)
>    at
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:505)
> 
> -- 
> Thanks and Regards,
> Raj

Re: Best Hbase Storage for PIG

Posted by Doug Meil <do...@explorysmedical.com>.
Hi there, as a sanity check with respect to writing have you
double-checked this section of the RefGuide..

http://hbase.apache.org/book.html#perf.writing

... regarding pre-created regions and monotonically increasing keys?

Also as a sanity check refer to this case study as a diagnostic roadmap..

http://hbase.apache.org/book.html#casestudies.perftroub




On 4/26/12 7:38 AM, "Rajgopal Vaithiyanathan" <ra...@gmail.com> wrote:

>Hey all,
>
>The default - HBaseStorage() takes hell lot of time for puts.
>
>In a cluster of 5 machines, insertion of 175 Million records took 4Hours
>45
>minutes
>Question -  Is this good enough ?
>each machine has 32 cores and 32GB ram with 7*600GB harddisks. HBASE's
>heap
>has been configured to 8GB.
>If the put speed is low, how can i improve them..?
>
>I tried tweaking the TableOutputFormat by increasing the WriteBufferSize
>to
>24MB, and adding the multi put feature (by adding 10,000 puts in ArrayList
>and putting it as a batch).  After doing this,  it started throwing
>
>java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>
>Which i assume is because, the clients took too long to put.
>
>The detailed log is as follows from one of the reduce job is as follows.
>
>I've 'censored' some of the details. which i assume is Okay.! :P
>2012-04-23 20:07:12,815 INFO org.apache.hadoop.util.NativeCodeLoader:
>Loaded the native-hadoop library
>2012-04-23 20:07:13,097 WARN
>org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already
>exists!
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:zookeeper.version=3.4.2-1221870, built on 12/21/2011 20:46 GMT
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:host.name=*****.*****
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.version=1.6.0_22
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.vendor=Sun Microsystems Inc.
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.home=/usr/lib/jvm/java-6-openjdk/jre
>2012-04-23 20:07:13,787 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.class.path=****************************
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.library.path=**********************
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.io.tmpdir=***************************
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:java.compiler=<NA>
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:os.name=Linux
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:os.arch=amd64
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:os.version=2.6.38-8-server
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:user.name=raj
>
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:user.home=*********
>2012-04-23 20:07:13,788 INFO org.apache.zookeeper.ZooKeeper: Client
>environment:user.dir=**********************:
>2012-04-23 20:07:13,790 INFO org.apache.zookeeper.ZooKeeper: Initiating
>client connection, connectString=master:2181 sessionTimeout=180000
>watcher=hconnection
>2012-04-23 20:07:13,822 INFO org.apache.zookeeper.ClientCnxn: Opening
>socket connection to server /172.21.208.180:2181
>2012-04-23 20:07:13,823 INFO
>org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: The identifier of
>this process is 72909@slave1.slave1
>2012-04-23 20:07:13,825 INFO org.apache.zookeeper.ClientCnxn: Socket
>connection established to master/172.21.208.180:2181, initiating session
>2012-04-23 20:07:13,840 INFO org.apache.zookeeper.ClientCnxn: Session
>establishment complete on server master/172.21.208.180:2181, sessionid =
>0x136dfa124e90015, negotiated timeout = 180000
>2012-04-23 20:07:14,129 INFO com.raj.OptimisedTableOutputFormat: Created
>table instance for index
>2012-04-23 20:07:14,184 INFO org.apache.hadoop.util.ProcessTree: setsid
>exited with exit code 0
>2012-04-23 20:07:14,205 INFO org.apache.hadoop.mapred.Task:  Using
>ResourceCalculatorPlugin :
>org.apache.hadoop.util.LinuxResourceCalculatorPlugin@4513e9fd
>2012-04-23 20:08:49,852 WARN
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n:
>Failed all from
>region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
>hostname=slave1, port=60020
>java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatchCallback(HConnectionManager.java:1557)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:142)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
>org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.ja
>va:639)
>    at
>org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo
>ntext.java:80)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Ma
>p.collect(PigMapOnly.java:48)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.runPipeline(PigGenericMapBase.java:269)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:262)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
>Caused by: java.net.SocketTimeoutException: Call to slave1/
>172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930
>)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
>org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEn
>gine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1386)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1384)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1383)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1110)
>    at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:603)
>    at java.lang.Thread.run(Thread.java:679)
>Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>waiting for channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41135remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:16
>4)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HB
>aseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseCl
>ient.java:571)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50
>5)
>2012-04-23 20:09:51,018 WARN
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n:
>Failed all from
>region=index,,1335191775144.2e69ca9ad2a2d92699aa34b1dc37f1bb.,
>hostname=slave1, port=60020
>java.util.concurrent.ExecutionException: java.net.SocketTimeoutException:
>Call to slave1/172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41150remote=slave1/
>172.21.208.176:60020]
>    at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:252)
>    at java.util.concurrent.FutureTask.get(FutureTask.java:111)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatchCallback(HConnectionManager.java:1557)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.processBatch(HConnectionManager.java:1409)
>    at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:900)
>    at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:773)
>    at org.apache.hadoop.hbase.client.HTable.put(HTable.java:760)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:142)
>    at
>com.raj.OptimisedTableOutputFormat$TableRecordWriter.write(OptimisedTableO
>utputFormat.java:1)
>    at com.raj.HBaseStorage.putNext(HBaseStorage.java:583)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:139)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputForm
>at$PigRecordWriter.write(PigOutputFormat.java:98)
>    at
>org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.ja
>va:639)
>    at
>org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputCo
>ntext.java:80)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Ma
>p.collect(PigMapOnly.java:48)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.runPipeline(PigGenericMapBase.java:269)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:262)
>    at
>org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMap
>Base.map(PigGenericMapBase.java:64)
>    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
>    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
>    at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>    at java.security.AccessController.doPrivileged(Native Method)
>    at javax.security.auth.Subject.doAs(Subject.java:416)
>    at
>org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.
>java:1083)
>    at org.apache.hadoop.mapred.Child.main(Child.java:249)
>Caused by: java.net.SocketTimeoutException: Call to slave1/
>172.21.208.176:60020 failed on socket timeout exception:
>java.net.SocketTimeoutException: 60000 millis timeout while waiting for
>channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41150remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:930
>)
>    at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:903)
>    at
>org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEn
>gine.java:150)
>    at $Proxy7.multi(Unknown Source)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1386)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3$1.call(HConnectionManager.java:1384)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n.getRegionServerWithoutRetries(HConnectionManager.java:1365)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1383)
>    at
>org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementatio
>n$3.call(HConnectionManager.java:1381)
>    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
>    at java.util.concurrent.FutureTask.run(FutureTask.java:166)
>    at
>java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:
>1110)
>    at
>java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java
>:603)
>    at java.lang.Thread.run(Thread.java:679)
>Caused by: java.net.SocketTimeoutException: 60000 millis timeout while
>waiting for channel to be ready for read. ch :
>java.nio.channels.SocketChannel[connected
>local=/172.21.208.176:41150remote=slave1/
>172.21.208.176:60020]
>    at
>org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:16
>4)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
>    at
>org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
>    at java.io.FilterInputStream.read(FilterInputStream.java:133)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HB
>aseClient.java:311)
>    at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>    at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>    at java.io.DataInputStream.readInt(DataInputStream.java:387)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseCl
>ient.java:571)
>    at
>org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:50
>5)
>
>-- 
>Thanks and Regards,
>Raj