You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dmitry Demeshchuk <de...@gmail.com> on 2010/07/21 13:10:00 UTC

Using Pig with HBase

Greetings.

I'm trying to query HBase using Pig but do something wrong and cannot
figure out what exactly.

1. First, I create a table in HBase:

hbase(main):001:0> create 'test_table', 'test_family'

and add values to it:

hbase(main):002:0> put 'test_table', '1', 'test_family:body', 'body1'
hbase(main):003:0> put 'test_table', '1', 'test_family:value', 'value1'
hbase(main):009:0> scan 'test_table'

ROW                          COLUMN+CELL
 1                           column=test_family:body,
timestamp=1279710032517, value=body1
 1                           column=test_family:value,
timestamp=1279710094584, value=value1

So, now I have something in base.


2. After that, I try to get data from HBase using Pig:

grunt> A = load 'test_table' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
test_family:value');
grunt> DUMP A;

Then I get an error message:

2010-07-21 06:01:58,387 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 2118: Unable to create input splits for: test_table



Could you please help me to find where I keep screwing up?

Thank you.

-- 
Best regards,
Dmitry Demeshchuk

Re: Using Pig with HBase

Posted by Dmitry Demeshchuk <de...@gmail.com>.
I tried to use Pig 0.7 with HBase 0.20.5. Well, jar archive in the Pig
source is for 0.20.0 but I kinda hoped it wouldn't make a big
difference.
As for elephant-bird, I downloaded it but it didn't work for me - I
guess that was a version problem since you mentioned Pig 0.6.


Here's the full dump from console:

grunt> A = load 'test_table' using
org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
test_family:value');
grunt> DUMP A;
2010-07-21 14:27:37,414 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics
with processName=JobTracker, sessionId=
2010-07-21 14:27:37,474 [main] INFO
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine -
(Name: Store(file:/tmp/temp-2060573535/tmp1000611667:org.apache.pig.builtin.BinStorage)
- 1-4 Operator Key: 1-4)
2010-07-21 14:27:37,507 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size before optimization: 1
2010-07-21 14:27:37,507 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
- MR plan size after optimization: 1
2010-07-21 14:27:37,527 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:37,533 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:37,534 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- mapred.job.reduce.markreset.buffer.percent is not set, set to
default 0.3
2010-07-21 14:27:38,995 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
- Setting up single store job
2010-07-21 14:27:39,040 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,041 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map-reduce job(s) waiting for submission.
2010-07-21 14:27:39,049 [Thread-5] WARN
org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
2010-07-21 14:27:39,164 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,183 [Thread-5] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:zookeeper.version=3.2.0--1, built on 05/15/2009 06:05 GMT
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:host.name=docspider.pravo.ru
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.version=1.6.0_20
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:java.vendor=Sun
Microsystems Inc.
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.home=/usr/lib/jvm/java-6-sun-1.6.0.20/jre
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.class.path=/home/dem/pig-0.7.0/bin/../conf:/usr/lib/jvm/java-6-sun-1.6.0.20//lib/tools.jar:/home/dem/pig-0.7.0/bin/../build/classes:/home/dem/pig-0.7.0/bin/../build/test/classes:/home/dem/pig-0.7.0/bin/../pig-0.7.0-core.jar:/home/dem/pig-0.7.0/bin/../build/pig-0.7.1-dev-core.jar:/home/dem/pig-0.7.0/bin/../lib/automaton.jar:/home/dem/pig-0.7.0/bin/../lib/hadoop20.jar:/home/dem/pig-0.7.0/bin/../lib/hbase-0.20.0.jar:/home/dem/pig-0.7.0/bin/../lib/hbase-0.20.0-test.jar:/home/dem/pig-0.7.0/bin/../lib/zookeeper-hbase-1329.jar
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.library.path=/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/amd64/server:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/lib/amd64:/usr/lib/jvm/java-6-sun-1.6.0.20/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:java.io.tmpdir=/tmp
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:java.compiler=<NA>
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:os.name=Linux
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:os.arch=amd64
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:os.version=2.6.31-20-server
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client environment:user.name=dem
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:user.home=/home/dem
2010-07-21 14:27:39,290 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Client
environment:user.dir=/home/dem/pig-0.7.0
2010-07-21 14:27:39,291 [Thread-5] INFO
org.apache.zookeeper.ZooKeeper - Initiating client connection,
host=localhost:2181 sessionTimeout=60000
watcher=org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher@6536d9d8
2010-07-21 14:27:39,292 [Thread-5] INFO
org.apache.zookeeper.ClientCnxn - zookeeper.disableAutoWatchReset is
false
2010-07-21 14:27:39,315 [Thread-5-SendThread] INFO
org.apache.zookeeper.ClientCnxn - Attempting connection to server
localhost/0:0:0:0:0:0:0:1:2181
2010-07-21 14:27:39,318 [Thread-5-SendThread] INFO
org.apache.zookeeper.ClientCnxn - Priming connection to
java.nio.channels.SocketChannel[connected local=/0:0:0:0:0:0:0:1:43793
remote=localhost/0:0:0:0:0:0:0:1:2181]
2010-07-21 14:27:39,321 [Thread-5-SendThread] INFO
org.apache.zookeeper.ClientCnxn - Server connection successful
2010-07-21 14:27:39,404 [Thread-5] ERROR
org.apache.hadoop.hbase.mapreduce.TableInputFormat -
java.lang.reflect.UndeclaredThrowableException
	at $Proxy0.getRegionInfo(Unknown Source)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:931)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:573)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:623)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:582)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:623)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
	at org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:125)
	at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:103)
	at org.apache.hadoop.hbase.mapreduce.TableInputFormat.setConf(TableInputFormat.java:73)
	at org.apache.pig.backend.hadoop.hbase.HBaseStorage.getInputFormat(HBaseStorage.java:96)
	at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:257)
	at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
	at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
	at org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
	at org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
	at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.hadoop.ipc.RemoteException: java.io.IOException:
java.io.IOException: Could not find requested method, the usual cause
is a version mismatch between client and server.
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
	at org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)

	at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:723)
	at org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:328)
	... 22 more

2010-07-21 14:27:39,542 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 0% complete
2010-07-21 14:27:39,542 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 100% complete
2010-07-21 14:27:39,542 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- 1 map reduce job(s) failed!
2010-07-21 14:27:39,548 [main] ERROR
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed to produce result in:
"file:/tmp/temp-2060573535/tmp1000611667"
2010-07-21 14:27:39,548 [main] INFO
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
- Failed!
2010-07-21 14:27:39,565 [main] INFO
org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM
Metrics with processName=JobTracker, sessionId= - already initialized
2010-07-21 14:27:39,568 [main] ERROR org.apache.pig.tools.grunt.Grunt
- ERROR 2118: Unable to create input splits for: test_table
Details at logfile: /home/dem/pig-0.7.0/pig_1279740439519.log


And the details from the log file:

Pig Stack Trace
---------------
ERROR 2118: Unable to create input splits for: test_table

org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable
to open iterator for alias A
        at org.apache.pig.PigServer.openIterator(PigServer.java:521)
        at org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:544)
        at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
        at org.apache.pig.Main.main(Main.java:357)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
6015: During execution, encountered a Hadoop error.
        at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
        at .apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
        at .apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
        at .apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
        at .apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
        at .apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
        at .apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
        at .lang.Thread.run(Thread.java:619)
Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
2118: Unable to create input splits for: test_table
        ... 8 more
Caused by: java.lang.NullPointerException
        at .apache.hadoop.hbase.mapreduce.TableInputFormatBase.getSplits(TableInputFormatBase.java:273)
        at .apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:258)


On Wed, Jul 21, 2010 at 10:52 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:
> Which version of Pig are you using? If 0.6, have you tried the elephant bird
> HBase loader?
> Is there a more detailed stack trace in the pig log?
>
> -Dmitriy
>
>
> On Wed, Jul 21, 2010 at 4:10 AM, Dmitry Demeshchuk <de...@gmail.com>wrote:
>
>> Greetings.
>>
>> I'm trying to query HBase using Pig but do something wrong and cannot
>> figure out what exactly.
>>
>> 1. First, I create a table in HBase:
>>
>> hbase(main):001:0> create 'test_table', 'test_family'
>>
>> and add values to it:
>>
>> hbase(main):002:0> put 'test_table', '1', 'test_family:body', 'body1'
>> hbase(main):003:0> put 'test_table', '1', 'test_family:value', 'value1'
>> hbase(main):009:0> scan 'test_table'
>>
>> ROW                          COLUMN+CELL
>>  1                           column=test_family:body,
>> timestamp=1279710032517, value=body1
>>  1                           column=test_family:value,
>> timestamp=1279710094584, value=value1
>>
>> So, now I have something in base.
>>
>>
>> 2. After that, I try to get data from HBase using Pig:
>>
>> grunt> A = load 'test_table' using
>> org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
>> test_family:value');
>> grunt> DUMP A;
>>
>> Then I get an error message:
>>
>> 2010-07-21 06:01:58,387 [main] ERROR org.apache.pig.tools.grunt.Grunt
>> - ERROR 2118: Unable to create input splits for: test_table
>>
>>
>>
>> Could you please help me to find where I keep screwing up?
>>
>> Thank you.
>>
>> --
>> Best regards,
>> Dmitry Demeshchuk
>>
>



-- 
Best regards,
Dmitry Demeshchuk

Re: Using Pig with HBase

Posted by Dmitriy Ryaboy <dv...@gmail.com>.
Which version of Pig are you using? If 0.6, have you tried the elephant bird
HBase loader?
Is there a more detailed stack trace in the pig log?

-Dmitriy


On Wed, Jul 21, 2010 at 4:10 AM, Dmitry Demeshchuk <de...@gmail.com>wrote:

> Greetings.
>
> I'm trying to query HBase using Pig but do something wrong and cannot
> figure out what exactly.
>
> 1. First, I create a table in HBase:
>
> hbase(main):001:0> create 'test_table', 'test_family'
>
> and add values to it:
>
> hbase(main):002:0> put 'test_table', '1', 'test_family:body', 'body1'
> hbase(main):003:0> put 'test_table', '1', 'test_family:value', 'value1'
> hbase(main):009:0> scan 'test_table'
>
> ROW                          COLUMN+CELL
>  1                           column=test_family:body,
> timestamp=1279710032517, value=body1
>  1                           column=test_family:value,
> timestamp=1279710094584, value=value1
>
> So, now I have something in base.
>
>
> 2. After that, I try to get data from HBase using Pig:
>
> grunt> A = load 'test_table' using
> org.apache.pig.backend.hadoop.hbase.HBaseStorage('test_family:body
> test_family:value');
> grunt> DUMP A;
>
> Then I get an error message:
>
> 2010-07-21 06:01:58,387 [main] ERROR org.apache.pig.tools.grunt.Grunt
> - ERROR 2118: Unable to create input splits for: test_table
>
>
>
> Could you please help me to find where I keep screwing up?
>
> Thank you.
>
> --
> Best regards,
> Dmitry Demeshchuk
>