You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Dmitriy Ryaboy <dv...@gmail.com> on 2010/07/01 18:41:58 UTC

Re: Looking for an example of using HBaseStorage with Pig

That looks like the hbase configs aren't completely on the path. I assume
you are able to query your hbase tables from the shell?
Are you sure your path to the hbase conf directory is on the classpath you
send to pig?
Pig's classpath is easy to manage for testing using the PIG_CLASSPATH
environment variable.

-D

On Wed, Jun 30, 2010 at 11:05 AM, Pavel Gutin <pa...@gmail.com> wrote:

> I moved forward, but I am still not able to do anything.
> Here's what I am doing:
> ================================
> grunt> register /usr/local/hadoop/pigtmp/pig-0.6.0/lib/ElephantBird.jar
> grunt> register /usr/local/hadoop/pigtmp/pig-0.6.0/lib/hbase-0.20.0.jar
> grunt> register
> /usr/local/hadoop/pigtmp/pig-0.6.0/lib/hbase-0.20.0-test.jar
> grunt> register
> /usr/local/hadoop/pigtmp/pig-0.6.0/lib/zookeeper-hbase-1329.jar
> grunt> a = load 'hbase://silk1' USING
> com.twitter.elephantbird.pig.load.HBaseLoader('f1:destination_port') AS
> (destination_port);
> 2010-06-30 13:44:02,583 [main] INFO
>  com.twitter.elephantbird.pig.load.HBaseLoader - no-arg constructor
> 2010-06-30 13:44:02,602 [main] INFO
>  com.twitter.elephantbird.pig.load.HBaseLoader - no-arg constructor
> grunt> dump a;
> 2010-06-30 13:44:05,321 [main] INFO
>  com.twitter.elephantbird.pig.load.HBaseLoader - no-arg constructor
> 2010-06-30 13:44:05,337 [main] INFO
>  com.twitter.elephantbird.pig.load.HBaseLoader - no-arg constructor
> 2010-06-30 13:44:05,452 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size before optimization: 1
> 2010-06-30 13:44:05,452 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
> - MR plan size after optimization: 1
> 2010-06-30 13:44:08,361 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler
> - Setting up single store job
> 2010-06-30 13:44:08,394 [Thread-12] WARN
>  org.apache.hadoop.mapred.JobClient
> - Use GenericOptionsParser for parsing the arguments. Applications should
> implement Tool for the same.
> 2010-06-30 13:44:08,597 [Thread-12] INFO
>  com.twitter.elephantbird.pig.load.HBaseLoader - no-arg constructor
> 2010-06-30 13:44:08,597 [Thread-12] INFO
>  com.twitter.elephantbird.pig.load.HBaseLoader - tablename: hbase://silk1
> 2010-06-30 13:44:08,651 [Thread-12] ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper - no clientPort found in
> zoo.cfg
> 2010-06-30 13:44:08,652 [Thread-12] ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper - no clientPort found in
> zoo.cfg
> 2010-06-30 13:44:08,652 [Thread-12] ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper - no clientPort found in
> zoo.cfg
> 2010-06-30 13:44:08,653 [Thread-12] ERROR
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper - no clientPort found in
> zoo.cfg
> 2010-06-30 13:44:09,390 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Cannot get jobid for this job
> 2010-06-30 13:44:09,391 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 100% complete
> 2010-06-30 13:44:09,391 [main] ERROR
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - 1 map reduce job(s) failed!
> 2010-06-30 13:44:09,402 [main] ERROR
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed to produce result in:
> "hdfs://fchadoop01:54310/tmp/temp642740681/tmp668990886"
> 2010-06-30 13:44:09,403 [main] INFO
>
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2010-06-30 13:44:09,405 [main] ERROR org.apache.pig.tools.grunt.Grunt -
> ERROR 2997: Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input slice for: hbase://silk1
> Details at logfile:
> /usr/local/hadoop/pigtmp/pig-0.6.0/pig_1277919819066.log
> grunt>
> ================================
>
> Here's the log file
>
>
> ================================
> Backend error message during job submission
> -------------------------------------------
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input slice for: hbase://silk1
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:269)
>        at
> org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
>        at
> org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)
>        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
>        at org.apache.hadoop.mapred.jobcontrol.Job.submit(Job.java:378)
>        at
>
> org.apache.hadoop.mapred.jobcontrol.JobControl.startReadyJobs(JobControl.java:247)
>        at
> org.apache.hadoop.mapred.jobcontrol.JobControl.run(JobControl.java:279)
>        at java.lang.Thread.run(Thread.java:619)
> Caused by: java.io.IOException: Could not read quorum servers from zoo.cfg
>        at
>
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.<init>(ZooKeeperWrapper.java:81)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$ClientZKWatcher.getZooKeeperWrapper(HConnectionManager.java:199)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getZooKeeperWrapper(HConnectionManager.java:878)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRootRegion(HConnectionManager.java:894)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:573)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:686)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:582)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:555)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:686)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:586)
>        at
>
> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:549)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:125)
>        at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:103)
>        at com.twitter.elephantbird.pig.load.HBaseLoader.ensureTable(Unknown
> Source)
>        at com.twitter.elephantbird.pig.load.HBaseLoader.validate(Unknown
> Source)
>        at
>
> org.apache.pig.impl.io.ValidatingInputFileSpec.validate(ValidatingInputFileSpec.java:59)
>        at
>
> org.apache.pig.impl.io.ValidatingInputFileSpec.<init>(ValidatingInputFileSpec.java:44)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:240)
>        ... 7 more
>
> Pig Stack Trace
> ---------------
> ERROR 2997: Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input slice for: hbase://silk1
>
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to
> open iterator for alias a
>        at org.apache.pig.PigServer.openIterator(PigServer.java:482)
>        at
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:539)
>        at
>
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:241)
>        at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
>        at
>
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
>         at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
>         at org.apache.pig.Main.main(Main.java:352)
> Caused by: org.apache.pig.backend.executionengine.ExecException: ERROR
> 2997:
> Unable to recreate exception from backend error:
> org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Unable to
> create input slice for: hbase://si\
> lk1
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:176)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:253)
>        at
>
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:249)
>        at
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:781)
>        at org.apache.pig.PigServer.store(PigServer.java:529)
>        at org.apache.pig.PigServer.openIterator(PigServer.java:465)
>        ... 6 more
> ================================
>
> I really appreciate all your help.
>
>
> On Tue, Jun 29, 2010 at 2:51 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > you need to register the jar before using it:
> >
> > register /path/to/elephant-bird/eb.jar
> >
> > On Tue, Jun 29, 2010 at 11:47 AM, Pavel Gutin <pa...@gmail.com>
> > wrote:
> >
> > > I have been trying it with 0.7.
> > > I switched to 0.6 but I get the following error
> > >
> > > Could not resolve com.twitter.elephantbird.pig.load.HBaseLoader using
> > > imports: [, org.apache.pig.builtin., org.apache.pig.impl.builtin.]
> > >
> > > Where do I need to add the JAR in my config in order to get pig to see
> > it?
> > >
> > >
> > > On Tue, Jun 29, 2010 at 12:43 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > wrote:
> > >
> > > > What version of pig are you running?
> > > > Elephant Bird works with Pig 0.6, not 0.7
> > > >
> > > > There is another version on the Jira that works with 0.7 but has a
> few
> > > > drawbacks (the big one being that it expects everything to be
> strings)
> > > > -D
> > > >
> > > > On Tue, Jun 29, 2010 at 8:10 AM, Pavel Gutin <pa...@gmail.com>
> > > wrote:
> > > >
> > > > > Thank you for trying to help me out. Here's the error that's in my
> > log
> > > > file
> > > > >
> > > > > ERROR 2998: Unhandled internal error. org/apache/pig/Slicer
> > > > >
> > > > > java.lang.NoClassDefFoundError: org/apache/pig/Slicer
> > > > >        at java.lang.ClassLoader.defineClass1(Native Method)
> > > > >        at
> java.lang.ClassLoader.defineClassCond(ClassLoader.java:632)
> > > > >        at java.lang.ClassLoader.defineClass(ClassLoader.java:616)
> > > > >        at
> > > > >
> > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:141)
> > > > >        at
> > java.net.URLClassLoader.defineClass(URLClassLoader.java:283)
> > > > >        at
> java.net.URLClassLoader.access$000(URLClassLoader.java:58)
> > > > >        at java.net.URLClassLoader$1.run(URLClassLoader.java:197)
> > > > >        at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >        at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> > > > >        at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:296)
> > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> > > > >        at java.lang.Class.forName0(Native Method)
> > > > >        at java.lang.Class.forName(Class.java:247)
> > > > >        at
> > > > >
> org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:422)
> > > > >        at
> > > > >
> > > >
> > >
> >
> org.apache.pig.impl.PigContext.instantiateFuncFromSpec(PigContext.java:452)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.parser.QueryParser.NonEvalFuncSpec(QueryParser.java:5087)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.parser.QueryParser.LoadClause(QueryParser.java:1434)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.parser.QueryParser.BaseExpr(QueryParser.java:1245)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Expr(QueryParser.java:911)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:700)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> > > > >        at
> > > org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1164)
> > > > >        at
> > > > org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1114)
> > > > >        at
> org.apache.pig.PigServer.registerQuery(PigServer.java:425)
> > > > >        at
> > > > >
> > org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:737)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:162)
> > > > >        at
> > > > >
> > > > >
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:138)
> > > > >        at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:75)
> > > > >        at org.apache.pig.Main.main(Main.java:357)
> > > > > Caused by: java.lang.ClassNotFoundException: org.apache.pig.Slicer
> > > > >        at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> > > > >        at java.security.AccessController.doPrivileged(Native
> Method)
> > > > >        at
> java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> > > > >        at
> > sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> > > > >        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> > > > >        ... 32 more
> > > > >
> > > > >
> > > > > On Mon, Jun 28, 2010 at 7:56 PM, Dmitriy Ryaboy <
> dvryaboy@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Pavel,
> > > > > > What does the log say?
> > > > > >
> > > > > > I am guessing you need to a) make sure that all the hbase config
> > > stuff
> > > > is
> > > > > > on
> > > > > > the classpath and b) load 'hbase://silk1'  (no host:port)
> > > > > >
> > > > > > -D
> > > > > >
> > > > > > On Mon, Jun 28, 2010 at 10:37 AM, Pavel Gutin <
> > pavelgutin@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > My apologies, i pasted the wrong line. I was testing to see if
> > pig
> > > > was
> > > > > > able
> > > > > > > to locate the JAR by misspelling the name on purpose
> > > > > > >
> > > > > > > Here's the correct error.
> > > > > > >
> > > > > > > grunt> a = load 'hbase://localhost:60000/silk1' USING
> > > > > > >
> > > com.twitter.elephantbird.pig.load.HBaseLoader('f1:destination_port')
> > > > AS
> > > > > > > (destination_port);
> > > > > > > 2010-06-28 13:19:01,288 [main] ERROR
> > > org.apache.pig.tools.grunt.Grunt
> > > > -
> > > > > > > ERROR 2998: Unhandled internal error. org/apache/pig/Slicer
> > > > > > > Details at logfile:
> > > > > > > /usr/local/hadoop/pigtmp/pig-0.7.0/pig_1277744862785.log
> > > > > > > grunt>
> > > > > > >
> > > > > > >
> > > > > > > On Mon, Jun 28, 2010 at 1:34 PM, Pavel Gutin <
> > pavelgutin@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > This seems like it might work for me. I downloaded it,
> compiled
> > > it,
> > > > > and
> > > > > > > > added the JAR to PIG_CLASSPATH
> > > > > > > >
> > > > > > > > However, when i try to run the following command, i get an
> > error
> > > > > > > >
> > > > > > > > grunt> a = load 'hbase://myTable' USING
> > > > > > > > co.twitter.elephantbird.pig.load.HBaseLoader('f1:col1') AS
> > > (col1);
> > > > > > > > 2010-06-28 13:13:59,607 [main] ERROR
> > > > org.apache.pig.tools.grunt.Grunt
> > > > > -
> > > > > > > > ERROR 1070: Could not resolve
> > > > > > > co.twitter.elephantbird.pig.load.HBaseLoader
> > > > > > > > using imports: [, org.apache.pig.builtin.,
> > > > > > org.apache.pig.impl.builtin.]
> > > > > > > > Details at logfile:
> > > > > > > > /usr/local/hadoop/pigtmp/pig-0.7.0/pig_1277744862785.log
> > > > > > > > grunt>
> > > > > > > >
> > > > > > > > I have a feeling I am not referencing the table the right
> way.
> > > > > > > >
> > > > > > > > On Mon, Jun 28, 2010 at 11:43 AM, Dmitriy Ryaboy <
> > > > dvryaboy@gmail.com
> > > > > > > >wrote:
> > > > > > > >
> > > > > > > >> There's an HBase LoadFunc that works with 0.6 in
> > Elephant-Bird.
> > > > > > > >> http://github.com/kevinweil/elephant-bird
> > > > > > > >>
> > > > > > > >> There are slides here that show usage:
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://squarecog.wordpress.com/2010/05/20/pig-hbase-hadoop-and-twitter-hug-talk-slides/
> > > > > > > >>
> > > > > > > >> -D
> > > > > > > >>
> > > > > > > >> On Mon, Jun 28, 2010 at 7:59 AM, Pavel Gutin <
> > > > pavelgutin@gmail.com>
> > > > > > > >> wrote:
> > > > > > > >>
> > > > > > > >> > I am trying to get Pig to query my HBase table, but I
> cannot
> > > > find
> > > > > > any
> > > > > > > >> > examples on the web. Can anyone provide me with a simple
> > > > example?
> > > > > > > >> >
> > > > > > > >> > The best I could find so far, was a little blurb on the
> > > > following
> > > > > > page
> > > > > > > >> >
> > > > >
> http://wiki.apache.org/pig/PigMultiQueryPerformanceSpecificationbut
> > > > > > > >> that
> > > > > > > >> > didn't help much.
> > > > > > > >> >
> > > > > > > >> > Thanks in advance.
> > > > > > > >> >
> > > > > > > >> >  - Pavel
> > > > > > > >> >
> > > > > > > >>
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>