You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@pig.apache.org by pig <ha...@gmail.com> on 2010/09/21 23:22:41 UTC

Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Hello,

I have a small cluster up and running with LZO compressed files in it.  I'm
using the lzo compression libraries available at
http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!)

So far everything works fine when I write regular map-reduce jobs.  I can
read in lzo files and write out lzo files without any problem.

I'm also using Pig 0.7 and it appears to be able to read LZO files out of
the box using the default LoadFunc (PigStorage).  However, I am currently
testing a large LZO file (20GB) which I indexed using the LzoIndexer and Pig
does not appear to be making use of the indexes.  The pig scripts that I've
run so far only have 3 mappers when processing the 20GB file.  My
understanding was that there should be 1 map for each block (256MB blocks)
so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7 support
indexed lzo files with the default load function?

If not, I was looking at elephant-bird and noticed it is only compatible
with Pig 0.6 and not 0.7+  Is that accurate?  What would be the recommended
solution for processing index lzo files using Pig 0.7.

Thank you for any assistance!

~Ed

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by pig <ha...@gmail.com>.

Hi Dimitry,

Using the REGISTER pig keyword got rid of the missing class error.  Thank
you!

I still have the error regarding the lzo codec missing.

I followed all the steps outlined by Gerrit and LZO works without any
problems when I'm using it in java based map-reduce programs (including
outputting compressed lzo files).  However, for some reason I still have the
problem with Pig.  I added the hadoop-kevinweil-gpl-compression.jar to my
$PIG_HOME/lib directory on all nodes and machine I'm running pig from.  THe
native libraries are also in the correct location in the
hadoop/lib/native/Linux-amd64 folder  (libgplcompression.so and
libhadoop.so.1.0.0)

I'm assuming that pig will pick up the JAVA_LIBRARY_PATH variable set in
hadoop-env.sh.  Is that correct?  Thank you!

~Ed

On Wed, Sep 22, 2010 at 5:44 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> By register I mean the pig register keyword.
>
> So, in addition to
>
> REGISTER elephant-bird-1.0.jar
>
> you should also
>
> REGISTER /usr/lib/elephant-pig/lib/google-collections-1.0.jar
>
> and possibly the rest of the jars in that directory. Might be simpler to
> jar
> them up together and just register a single jar.
>
>
> -D
>
> On Wed, Sep 22, 2010 at 1:47 PM, pig <ha...@gmail.com> wrote:
>
> > I added the jars to all my nodes in /usr/lib/elephant-pig/lib
> >
> > I then modified hadoop-env.sh for all nodes so that it includes the entry
> >
> >     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib/*:$PIG_CLASSPATH
> >
> > I start up the grunt shell and first past the line:
> >
> >     REGISTER elephant-bird-1.0.jar
> >
> > This has no problems.  Then I add the line:
> >
> >     A = LOAD '/user/foo/input' USING
> > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|');
> >
> > At this point the following error prints to screen:
> >
> > --------------------
> > [main] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not
> > load
> > native gpl library
> > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> > ...
> > [main] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> > without native-hadoop
> > --------------------
> >
> > No log entry is generated and the grunt shell continues to work.  (LZO
> > works
> > fine with when I run java based map-reduce programs). I then add the
> final
> > 2
> > lines of the pig script:
> >
> >     B=LIMIT A 100;
> >     DUMP B;
> >
> > The program starts to execute and fails.  The nodes running the mapper
> give
> > the error java.lang.ClassNotFoundException:
> com.google.common.collect.Maps
> > and fails.  (This was the same error I was getting before in my pig log
> > files).  The class not found exception no longer shows up in my pig log
> > file.  In its place is a more generic RunTimeException.
> >
> > On all nodes I also tried
> >
> >     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib:$PIG_CLASSPATH
> >
> > (without the *)
> >
> > and I also tried modifying JAVA_LIBRARY_PATH to include the location of
> the
> > elephant-pig jar files.
> >
> > I'm using the cloudera distro of Hadoop 0.20.2 if that might someone be
> > causing problems.  When you said I might need to "register" the jar files
> > was does that mean exactly?  Thanks again for all your assistance and
> > prompt
> > responses.
> >
> > ~Ed
> >
> > On Wed, Sep 22, 2010 at 3:46 PM, pig <ha...@gmail.com> wrote:
> >
> > > Ah,
> > >
> > > I didn't realize I need to put the jars on all the nodes since the
> error
> > is
> > > being thrown before the pig script actually executes (it's throwing the
> > > error in the parsing stage).  I assumed since the pig script hasn't
> > executed
> > > yet it wasn't doing anything with the Hadoop nodes.
> > >
> > > I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put
> the
> > > jar files on all the slave nodes.  Hopefully that will solve the
> problem.
> > >
> > > ~Ed
> > >
> > >
> > > On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvryaboy@gmail.com
> > >wrote:
> > >
> > >> try PIG_CLASSPATH
> > >>
> > >> Oh and you might need to explicitly register them.. sorry, forgot. We
> > just
> > >> have them on the hadoop classpath on the nodes themselves, so we don't
> > >> have
> > >> to do that, but you might if you are starting fresh.
> > >>
> > >> -D
> > >>
> > >> On Wed, Sep 22, 2010 at 12:01 PM, pig <ha...@gmail.com> wrote:
> > >>
> > >> > [foo]$ echo $CLASSPATH
> > >> > :/usr/lib/elephant-bird/lib/*
> > >> >
> > >> > This has been set for both user foo and hadoop but I still get the
> > same
> > >> > error.  Is this the correct environment variable to be setting?
> > >> >
> > >> > Thank you!
> > >> >
> > >> > ~Ed
> > >> >
> > >> >
> > >> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dvryaboy@gmail.com
> >
> > >> > wrote:
> > >> >
> > >> > > elephant-bird/lib/* (the * is important)
> > >> > >
> > >> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <ha...@gmail.com>
> wrote:
> > >> > >
> > >> > > > Well I thought that would be a simple enough fix but no luck so
> > far.
> > >> > > >
> > >> > > > I've added the elephant-bird/lib directory (which I made world
> > >> readable
> > >> > > and
> > >> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and
> > HADOOP_CLASSPATH
> > >> as
> > >> > > > both
> > >> > > > the user running grunt and the hadoop user. (sort of a shotgun
> > >> > approach)
> > >> > > >
> > >> > > > I still get the error where it complains about nogplcompression
> > and
> > >> in
> > >> > > the
> > >> > > > log it has an error where it can't find
> > >> com.google.common.collect.Maps
> > >> > > >
> > >> > > > Are these two separate problems or is it one problem that is
> > causing
> > >> > two
> > >> > > > different errors?  Thank you for the help!
> > >> > > >
> > >> > > > ~Ed
> > >> > > >
> > >> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <
> > dvryaboy@gmail.com
> > >> >
> > >> > > > wrote:
> > >> > > >
> > >> > > > > You need the jars in elephant-bird's lib/ on your classpath to
> > run
> > >> > > > > Elephant-Bird.
> > >> > > > >
> > >> > > > >
> > >> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com>
> > >> wrote:
> > >> > > > >
> > >> > > > > > Thank you for pointing out the 0.7 branch.   I'm giving the
> > 0.7
> > >> > > branch
> > >> > > > a
> > >> > > > > > shot and have run into a problem when trying to run the
> > >> following
> > >> > > test
> > >> > > > > pig
> > >> > > > > > script:
> > >> > > > > >
> > >> > > > > > REGISTER elephant-bird-1.0.jar
> > >> > > > > > A = LOAD '/user/foo/input' USING
> > >> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > >> > > > > > B = LIMIT A 100;
> > >> > > > > > DUMP B;
> > >> > > > > >
> > >> > > > > > When I try to run this I get the following error:
> > >> > > > > >
> > >> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> > >> > > java.library.path
> > >> > > > > >  ....
> > >> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load
> > >> native-lzo
> > >> > > > > without
> > >> > > > > > native-hadoop
> > >> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999:
> Unexpected
> > >> > > internal
> > >> > > > > > error.  could not instantiate
> > >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > >> > arguments
> > >> > > > '[
> > >> > > > > > ]'
> > >> > > > > >
> > >> > > > > > Looking at the log file it gives the following:
> > >> > > > > >
> > >> > > > > > java.lang.RuntimeException: could not instantiate
> > >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > >> > arguments
> > >> > > > '[
> > >> > > > > > ]'
> > >> > > > > > ...
> > >> > > > > > Caused by: java.lang.reflect.InvocationTargetException
> > >> > > > > > ...
> > >> > > > > > Caused by: java.lang.NoClassDefFoundError:
> > >> > > > com/google/common/collect/Maps
> > >> > > > > > ...
> > >> > > > > > Caused by: java.lang.ClassNotFoundException:
> > >> > > > > com.google.common.collect.Maps
> > >> > > > > >
> > >> > > > > > What is confusing me is that LZO compression and
> decompression
> > >> > works
> > >> > > > fine
> > >> > > > > > when I'm running a normal java based map reduce program so I
> > >> feel
> > >> > as
> > >> > > > > though
> > >> > > > > > the libraries have to be in the right place with the right
> > >> settings
> > >> > > for
> > >> > > > > > java.library.path.  Otherwise how would normal java
> map-reduce
> > >> > work?
> > >> > > >  Is
> > >> > > > > > there some other location I need to set JAVA_LIBRARY_PATH
> for
> > >> pig
> > >> > to
> > >> > > > pick
> > >> > > > > > it
> > >> > > > > > up?  My understanding was that it would get this from
> > >> > hadoop-env.sh.
> > >> > > >  Are
> > >> > > > > > the missing com.google.common.collect.Maps the real problem
> > >> here?
> > >> > > >  Thank
> > >> > > > > > you
> > >> > > > > > for any help!
> > >> > > > > >
> > >> > > > > > ~Ed
> > >> > > > > >
> > >> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
> > >> > dvryaboy@gmail.com>
> > >> > > > > > wrote:
> > >> > > > > >
> > >> > > > > > > Hi Ed,
> > >> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a
> > >> branch
> > >> > > for
> > >> > > > > 0.7
> > >> > > > > > > that I haven't tested:
> > >> > http://github.com/hirohanin/elephant-bird/
> > >> > > > > > > Try it, let me know if it works.
> > >> > > > > > >
> > >> > > > > > > -D
> > >> > > > > > >
> > >> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <
> hadoopnode@gmail.com>
> > >> > wrote:
> > >> > > > > > >
> > >> > > > > > > > Hello,
> > >> > > > > > > >
> > >> > > > > > > > I have a small cluster up and running with LZO
> compressed
> > >> files
> > >> > > in
> > >> > > > > it.
> > >> > > > > > >  I'm
> > >> > > > > > > > using the lzo compression libraries available at
> > >> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> > >> > > maintaining
> > >> > > > > > this!)
> > >> > > > > > > >
> > >> > > > > > > > So far everything works fine when I write regular
> > map-reduce
> > >> > > jobs.
> > >> > > >  I
> > >> > > > > > can
> > >> > > > > > > > read in lzo files and write out lzo files without any
> > >> problem.
> > >> > > > > > > >
> > >> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read
> > LZO
> > >> > > files
> > >> > > > > out
> > >> > > > > > of
> > >> > > > > > > > the box using the default LoadFunc (PigStorage).
>  However,
> > I
> > >> am
> > >> > > > > > currently
> > >> > > > > > > > testing a large LZO file (20GB) which I indexed using
> the
> > >> > > > LzoIndexer
> > >> > > > > > and
> > >> > > > > > > > Pig
> > >> > > > > > > > does not appear to be making use of the indexes.  The
> pig
> > >> > scripts
> > >> > > > > that
> > >> > > > > > > I've
> > >> > > > > > > > run so far only have 3 mappers when processing the 20GB
> > >> file.
> > >> >  My
> > >> > > > > > > > understanding was that there should be 1 map for each
> > block
> > >> > > (256MB
> > >> > > > > > > blocks)
> > >> > > > > > > > so about 80 mappers when processing the 20GB lzo file.
> >  Does
> > >> > Pig
> > >> > > > 0.7
> > >> > > > > > > > support
> > >> > > > > > > > indexed lzo files with the default load function?
> > >> > > > > > > >
> > >> > > > > > > > If not, I was looking at elephant-bird and noticed it is
> > >> only
> > >> > > > > > compatible
> > >> > > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would
> > be
> > >> the
> > >> > > > > > > recommended
> > >> > > > > > > > solution for processing index lzo files using Pig 0.7.
> > >> > > > > > > >
> > >> > > > > > > > Thank you for any assistance!
> > >> > > > > > > >
> > >> > > > > > > > ~Ed
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >>
> > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

By register I mean the pig register keyword.

So, in addition to

REGISTER elephant-bird-1.0.jar

you should also

REGISTER /usr/lib/elephant-pig/lib/google-collections-1.0.jar

and possibly the rest of the jars in that directory. Might be simpler to jar
them up together and just register a single jar.


-D

On Wed, Sep 22, 2010 at 1:47 PM, pig <ha...@gmail.com> wrote:

> I added the jars to all my nodes in /usr/lib/elephant-pig/lib
>
> I then modified hadoop-env.sh for all nodes so that it includes the entry
>
>     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib/*:$PIG_CLASSPATH
>
> I start up the grunt shell and first past the line:
>
>     REGISTER elephant-bird-1.0.jar
>
> This has no problems.  Then I add the line:
>
>     A = LOAD '/user/foo/input' USING
> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|');
>
> At this point the following error prints to screen:
>
> --------------------
> [main] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not
> load
> native gpl library
> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> ...
> [main] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> without native-hadoop
> --------------------
>
> No log entry is generated and the grunt shell continues to work.  (LZO
> works
> fine with when I run java based map-reduce programs). I then add the final
> 2
> lines of the pig script:
>
>     B=LIMIT A 100;
>     DUMP B;
>
> The program starts to execute and fails.  The nodes running the mapper give
> the error java.lang.ClassNotFoundException: com.google.common.collect.Maps
> and fails.  (This was the same error I was getting before in my pig log
> files).  The class not found exception no longer shows up in my pig log
> file.  In its place is a more generic RunTimeException.
>
> On all nodes I also tried
>
>     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib:$PIG_CLASSPATH
>
> (without the *)
>
> and I also tried modifying JAVA_LIBRARY_PATH to include the location of the
> elephant-pig jar files.
>
> I'm using the cloudera distro of Hadoop 0.20.2 if that might someone be
> causing problems.  When you said I might need to "register" the jar files
> was does that mean exactly?  Thanks again for all your assistance and
> prompt
> responses.
>
> ~Ed
>
> On Wed, Sep 22, 2010 at 3:46 PM, pig <ha...@gmail.com> wrote:
>
> > Ah,
> >
> > I didn't realize I need to put the jars on all the nodes since the error
> is
> > being thrown before the pig script actually executes (it's throwing the
> > error in the parsing stage).  I assumed since the pig script hasn't
> executed
> > yet it wasn't doing anything with the Hadoop nodes.
> >
> > I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put the
> > jar files on all the slave nodes.  Hopefully that will solve the problem.
> >
> > ~Ed
> >
> >
> > On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dvryaboy@gmail.com
> >wrote:
> >
> >> try PIG_CLASSPATH
> >>
> >> Oh and you might need to explicitly register them.. sorry, forgot. We
> just
> >> have them on the hadoop classpath on the nodes themselves, so we don't
> >> have
> >> to do that, but you might if you are starting fresh.
> >>
> >> -D
> >>
> >> On Wed, Sep 22, 2010 at 12:01 PM, pig <ha...@gmail.com> wrote:
> >>
> >> > [foo]$ echo $CLASSPATH
> >> > :/usr/lib/elephant-bird/lib/*
> >> >
> >> > This has been set for both user foo and hadoop but I still get the
> same
> >> > error.  Is this the correct environment variable to be setting?
> >> >
> >> > Thank you!
> >> >
> >> > ~Ed
> >> >
> >> >
> >> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dv...@gmail.com>
> >> > wrote:
> >> >
> >> > > elephant-bird/lib/* (the * is important)
> >> > >
> >> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <ha...@gmail.com> wrote:
> >> > >
> >> > > > Well I thought that would be a simple enough fix but no luck so
> far.
> >> > > >
> >> > > > I've added the elephant-bird/lib directory (which I made world
> >> readable
> >> > > and
> >> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and
> HADOOP_CLASSPATH
> >> as
> >> > > > both
> >> > > > the user running grunt and the hadoop user. (sort of a shotgun
> >> > approach)
> >> > > >
> >> > > > I still get the error where it complains about nogplcompression
> and
> >> in
> >> > > the
> >> > > > log it has an error where it can't find
> >> com.google.common.collect.Maps
> >> > > >
> >> > > > Are these two separate problems or is it one problem that is
> causing
> >> > two
> >> > > > different errors?  Thank you for the help!
> >> > > >
> >> > > > ~Ed
> >> > > >
> >> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <
> dvryaboy@gmail.com
> >> >
> >> > > > wrote:
> >> > > >
> >> > > > > You need the jars in elephant-bird's lib/ on your classpath to
> run
> >> > > > > Elephant-Bird.
> >> > > > >
> >> > > > >
> >> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com>
> >> wrote:
> >> > > > >
> >> > > > > > Thank you for pointing out the 0.7 branch.   I'm giving the
> 0.7
> >> > > branch
> >> > > > a
> >> > > > > > shot and have run into a problem when trying to run the
> >> following
> >> > > test
> >> > > > > pig
> >> > > > > > script:
> >> > > > > >
> >> > > > > > REGISTER elephant-bird-1.0.jar
> >> > > > > > A = LOAD '/user/foo/input' USING
> >> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> >> > > > > > B = LIMIT A 100;
> >> > > > > > DUMP B;
> >> > > > > >
> >> > > > > > When I try to run this I get the following error:
> >> > > > > >
> >> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> >> > > java.library.path
> >> > > > > >  ....
> >> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load
> >> native-lzo
> >> > > > > without
> >> > > > > > native-hadoop
> >> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
> >> > > internal
> >> > > > > > error.  could not instantiate
> >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> >> > arguments
> >> > > > '[
> >> > > > > > ]'
> >> > > > > >
> >> > > > > > Looking at the log file it gives the following:
> >> > > > > >
> >> > > > > > java.lang.RuntimeException: could not instantiate
> >> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> >> > arguments
> >> > > > '[
> >> > > > > > ]'
> >> > > > > > ...
> >> > > > > > Caused by: java.lang.reflect.InvocationTargetException
> >> > > > > > ...
> >> > > > > > Caused by: java.lang.NoClassDefFoundError:
> >> > > > com/google/common/collect/Maps
> >> > > > > > ...
> >> > > > > > Caused by: java.lang.ClassNotFoundException:
> >> > > > > com.google.common.collect.Maps
> >> > > > > >
> >> > > > > > What is confusing me is that LZO compression and decompression
> >> > works
> >> > > > fine
> >> > > > > > when I'm running a normal java based map reduce program so I
> >> feel
> >> > as
> >> > > > > though
> >> > > > > > the libraries have to be in the right place with the right
> >> settings
> >> > > for
> >> > > > > > java.library.path.  Otherwise how would normal java map-reduce
> >> > work?
> >> > > >  Is
> >> > > > > > there some other location I need to set JAVA_LIBRARY_PATH for
> >> pig
> >> > to
> >> > > > pick
> >> > > > > > it
> >> > > > > > up?  My understanding was that it would get this from
> >> > hadoop-env.sh.
> >> > > >  Are
> >> > > > > > the missing com.google.common.collect.Maps the real problem
> >> here?
> >> > > >  Thank
> >> > > > > > you
> >> > > > > > for any help!
> >> > > > > >
> >> > > > > > ~Ed
> >> > > > > >
> >> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
> >> > dvryaboy@gmail.com>
> >> > > > > > wrote:
> >> > > > > >
> >> > > > > > > Hi Ed,
> >> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a
> >> branch
> >> > > for
> >> > > > > 0.7
> >> > > > > > > that I haven't tested:
> >> > http://github.com/hirohanin/elephant-bird/
> >> > > > > > > Try it, let me know if it works.
> >> > > > > > >
> >> > > > > > > -D
> >> > > > > > >
> >> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com>
> >> > wrote:
> >> > > > > > >
> >> > > > > > > > Hello,
> >> > > > > > > >
> >> > > > > > > > I have a small cluster up and running with LZO compressed
> >> files
> >> > > in
> >> > > > > it.
> >> > > > > > >  I'm
> >> > > > > > > > using the lzo compression libraries available at
> >> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> >> > > maintaining
> >> > > > > > this!)
> >> > > > > > > >
> >> > > > > > > > So far everything works fine when I write regular
> map-reduce
> >> > > jobs.
> >> > > >  I
> >> > > > > > can
> >> > > > > > > > read in lzo files and write out lzo files without any
> >> problem.
> >> > > > > > > >
> >> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read
> LZO
> >> > > files
> >> > > > > out
> >> > > > > > of
> >> > > > > > > > the box using the default LoadFunc (PigStorage).  However,
> I
> >> am
> >> > > > > > currently
> >> > > > > > > > testing a large LZO file (20GB) which I indexed using the
> >> > > > LzoIndexer
> >> > > > > > and
> >> > > > > > > > Pig
> >> > > > > > > > does not appear to be making use of the indexes.  The pig
> >> > scripts
> >> > > > > that
> >> > > > > > > I've
> >> > > > > > > > run so far only have 3 mappers when processing the 20GB
> >> file.
> >> >  My
> >> > > > > > > > understanding was that there should be 1 map for each
> block
> >> > > (256MB
> >> > > > > > > blocks)
> >> > > > > > > > so about 80 mappers when processing the 20GB lzo file.
>  Does
> >> > Pig
> >> > > > 0.7
> >> > > > > > > > support
> >> > > > > > > > indexed lzo files with the default load function?
> >> > > > > > > >
> >> > > > > > > > If not, I was looking at elephant-bird and noticed it is
> >> only
> >> > > > > > compatible
> >> > > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would
> be
> >> the
> >> > > > > > > recommended
> >> > > > > > > > solution for processing index lzo files using Pig 0.7.
> >> > > > > > > >
> >> > > > > > > > Thank you for any assistance!
> >> > > > > > > >
> >> > > > > > > > ~Ed
> >> > > > > > > >
> >> > > > > > >
> >> > > > > >
> >> > > > >
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by pig <ha...@gmail.com>.

I added the jars to all my nodes in /usr/lib/elephant-pig/lib

I then modified hadoop-env.sh for all nodes so that it includes the entry

     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib/*:$PIG_CLASSPATH

I start up the grunt shell and first past the line:

     REGISTER elephant-bird-1.0.jar

This has no problems.  Then I add the line:

     A = LOAD '/user/foo/input' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('|');

At this point the following error prints to screen:

--------------------
[main] ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader - Could not load
native gpl library
java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
...
[main] ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
without native-hadoop
--------------------

No log entry is generated and the grunt shell continues to work.  (LZO works
fine with when I run java based map-reduce programs). I then add the final 2
lines of the pig script:

     B=LIMIT A 100;
     DUMP B;

The program starts to execute and fails.  The nodes running the mapper give
the error java.lang.ClassNotFoundException: com.google.common.collect.Maps
and fails.  (This was the same error I was getting before in my pig log
files).  The class not found exception no longer shows up in my pig log
file.  In its place is a more generic RunTimeException.

On all nodes I also tried

     export PIG_CLASSPATH=/usr/lib/elephant-pig/lib:$PIG_CLASSPATH

(without the *)

and I also tried modifying JAVA_LIBRARY_PATH to include the location of the
elephant-pig jar files.

I'm using the cloudera distro of Hadoop 0.20.2 if that might someone be
causing problems.  When you said I might need to "register" the jar files
was does that mean exactly?  Thanks again for all your assistance and prompt
responses.

~Ed

On Wed, Sep 22, 2010 at 3:46 PM, pig <ha...@gmail.com> wrote:

> Ah,
>
> I didn't realize I need to put the jars on all the nodes since the error is
> being thrown before the pig script actually executes (it's throwing the
> error in the parsing stage).  I assumed since the pig script hasn't executed
> yet it wasn't doing anything with the Hadoop nodes.
>
> I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put the
> jar files on all the slave nodes.  Hopefully that will solve the problem.
>
> ~Ed
>
>
> On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dv...@gmail.com>wrote:
>
>> try PIG_CLASSPATH
>>
>> Oh and you might need to explicitly register them.. sorry, forgot. We just
>> have them on the hadoop classpath on the nodes themselves, so we don't
>> have
>> to do that, but you might if you are starting fresh.
>>
>> -D
>>
>> On Wed, Sep 22, 2010 at 12:01 PM, pig <ha...@gmail.com> wrote:
>>
>> > [foo]$ echo $CLASSPATH
>> > :/usr/lib/elephant-bird/lib/*
>> >
>> > This has been set for both user foo and hadoop but I still get the same
>> > error.  Is this the correct environment variable to be setting?
>> >
>> > Thank you!
>> >
>> > ~Ed
>> >
>> >
>> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dv...@gmail.com>
>> > wrote:
>> >
>> > > elephant-bird/lib/* (the * is important)
>> > >
>> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <ha...@gmail.com> wrote:
>> > >
>> > > > Well I thought that would be a simple enough fix but no luck so far.
>> > > >
>> > > > I've added the elephant-bird/lib directory (which I made world
>> readable
>> > > and
>> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH
>> as
>> > > > both
>> > > > the user running grunt and the hadoop user. (sort of a shotgun
>> > approach)
>> > > >
>> > > > I still get the error where it complains about nogplcompression and
>> in
>> > > the
>> > > > log it has an error where it can't find
>> com.google.common.collect.Maps
>> > > >
>> > > > Are these two separate problems or is it one problem that is causing
>> > two
>> > > > different errors?  Thank you for the help!
>> > > >
>> > > > ~Ed
>> > > >
>> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dvryaboy@gmail.com
>> >
>> > > > wrote:
>> > > >
>> > > > > You need the jars in elephant-bird's lib/ on your classpath to run
>> > > > > Elephant-Bird.
>> > > > >
>> > > > >
>> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com>
>> wrote:
>> > > > >
>> > > > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
>> > > branch
>> > > > a
>> > > > > > shot and have run into a problem when trying to run the
>> following
>> > > test
>> > > > > pig
>> > > > > > script:
>> > > > > >
>> > > > > > REGISTER elephant-bird-1.0.jar
>> > > > > > A = LOAD '/user/foo/input' USING
>> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
>> > > > > > B = LIMIT A 100;
>> > > > > > DUMP B;
>> > > > > >
>> > > > > > When I try to run this I get the following error:
>> > > > > >
>> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
>> > > java.library.path
>> > > > > >  ....
>> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load
>> native-lzo
>> > > > > without
>> > > > > > native-hadoop
>> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
>> > > internal
>> > > > > > error.  could not instantiate
>> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
>> > arguments
>> > > > '[
>> > > > > > ]'
>> > > > > >
>> > > > > > Looking at the log file it gives the following:
>> > > > > >
>> > > > > > java.lang.RuntimeException: could not instantiate
>> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
>> > arguments
>> > > > '[
>> > > > > > ]'
>> > > > > > ...
>> > > > > > Caused by: java.lang.reflect.InvocationTargetException
>> > > > > > ...
>> > > > > > Caused by: java.lang.NoClassDefFoundError:
>> > > > com/google/common/collect/Maps
>> > > > > > ...
>> > > > > > Caused by: java.lang.ClassNotFoundException:
>> > > > > com.google.common.collect.Maps
>> > > > > >
>> > > > > > What is confusing me is that LZO compression and decompression
>> > works
>> > > > fine
>> > > > > > when I'm running a normal java based map reduce program so I
>> feel
>> > as
>> > > > > though
>> > > > > > the libraries have to be in the right place with the right
>> settings
>> > > for
>> > > > > > java.library.path.  Otherwise how would normal java map-reduce
>> > work?
>> > > >  Is
>> > > > > > there some other location I need to set JAVA_LIBRARY_PATH for
>> pig
>> > to
>> > > > pick
>> > > > > > it
>> > > > > > up?  My understanding was that it would get this from
>> > hadoop-env.sh.
>> > > >  Are
>> > > > > > the missing com.google.common.collect.Maps the real problem
>> here?
>> > > >  Thank
>> > > > > > you
>> > > > > > for any help!
>> > > > > >
>> > > > > > ~Ed
>> > > > > >
>> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
>> > dvryaboy@gmail.com>
>> > > > > > wrote:
>> > > > > >
>> > > > > > > Hi Ed,
>> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a
>> branch
>> > > for
>> > > > > 0.7
>> > > > > > > that I haven't tested:
>> > http://github.com/hirohanin/elephant-bird/
>> > > > > > > Try it, let me know if it works.
>> > > > > > >
>> > > > > > > -D
>> > > > > > >
>> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com>
>> > wrote:
>> > > > > > >
>> > > > > > > > Hello,
>> > > > > > > >
>> > > > > > > > I have a small cluster up and running with LZO compressed
>> files
>> > > in
>> > > > > it.
>> > > > > > >  I'm
>> > > > > > > > using the lzo compression libraries available at
>> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
>> > > maintaining
>> > > > > > this!)
>> > > > > > > >
>> > > > > > > > So far everything works fine when I write regular map-reduce
>> > > jobs.
>> > > >  I
>> > > > > > can
>> > > > > > > > read in lzo files and write out lzo files without any
>> problem.
>> > > > > > > >
>> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
>> > > files
>> > > > > out
>> > > > > > of
>> > > > > > > > the box using the default LoadFunc (PigStorage).  However, I
>> am
>> > > > > > currently
>> > > > > > > > testing a large LZO file (20GB) which I indexed using the
>> > > > LzoIndexer
>> > > > > > and
>> > > > > > > > Pig
>> > > > > > > > does not appear to be making use of the indexes.  The pig
>> > scripts
>> > > > > that
>> > > > > > > I've
>> > > > > > > > run so far only have 3 mappers when processing the 20GB
>> file.
>> >  My
>> > > > > > > > understanding was that there should be 1 map for each block
>> > > (256MB
>> > > > > > > blocks)
>> > > > > > > > so about 80 mappers when processing the 20GB lzo file.  Does
>> > Pig
>> > > > 0.7
>> > > > > > > > support
>> > > > > > > > indexed lzo files with the default load function?
>> > > > > > > >
>> > > > > > > > If not, I was looking at elephant-bird and noticed it is
>> only
>> > > > > > compatible
>> > > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be
>> the
>> > > > > > > recommended
>> > > > > > > > solution for processing index lzo files using Pig 0.7.
>> > > > > > > >
>> > > > > > > > Thank you for any assistance!
>> > > > > > > >
>> > > > > > > > ~Ed
>> > > > > > > >
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by pig <ha...@gmail.com>.

Ah,

I didn't realize I need to put the jars on all the nodes since the error is
being thrown before the pig script actually executes (it's throwing the
error in the parsing stage).  I assumed since the pig script hasn't executed
yet it wasn't doing anything with the Hadoop nodes.

I will try adding PIG_CLASSPATH to my hadoop-env.sh and will then put the
jar files on all the slave nodes.  Hopefully that will solve the problem.

~Ed

On Wed, Sep 22, 2010 at 3:28 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> try PIG_CLASSPATH
>
> Oh and you might need to explicitly register them.. sorry, forgot. We just
> have them on the hadoop classpath on the nodes themselves, so we don't have
> to do that, but you might if you are starting fresh.
>
> -D
>
> On Wed, Sep 22, 2010 at 12:01 PM, pig <ha...@gmail.com> wrote:
>
> > [foo]$ echo $CLASSPATH
> > :/usr/lib/elephant-bird/lib/*
> >
> > This has been set for both user foo and hadoop but I still get the same
> > error.  Is this the correct environment variable to be setting?
> >
> > Thank you!
> >
> > ~Ed
> >
> >
> > On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > wrote:
> >
> > > elephant-bird/lib/* (the * is important)
> > >
> > > On Wed, Sep 22, 2010 at 11:42 AM, pig <ha...@gmail.com> wrote:
> > >
> > > > Well I thought that would be a simple enough fix but no luck so far.
> > > >
> > > > I've added the elephant-bird/lib directory (which I made world
> readable
> > > and
> > > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH
> as
> > > > both
> > > > the user running grunt and the hadoop user. (sort of a shotgun
> > approach)
> > > >
> > > > I still get the error where it complains about nogplcompression and
> in
> > > the
> > > > log it has an error where it can't find
> com.google.common.collect.Maps
> > > >
> > > > Are these two separate problems or is it one problem that is causing
> > two
> > > > different errors?  Thank you for the help!
> > > >
> > > > ~Ed
> > > >
> > > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > > wrote:
> > > >
> > > > > You need the jars in elephant-bird's lib/ on your classpath to run
> > > > > Elephant-Bird.
> > > > >
> > > > >
> > > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com>
> wrote:
> > > > >
> > > > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
> > > branch
> > > > a
> > > > > > shot and have run into a problem when trying to run the following
> > > test
> > > > > pig
> > > > > > script:
> > > > > >
> > > > > > REGISTER elephant-bird-1.0.jar
> > > > > > A = LOAD '/user/foo/input' USING
> > > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > > > > > B = LIMIT A 100;
> > > > > > DUMP B;
> > > > > >
> > > > > > When I try to run this I get the following error:
> > > > > >
> > > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> > > java.library.path
> > > > > >  ....
> > > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load
> native-lzo
> > > > > without
> > > > > > native-hadoop
> > > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
> > > internal
> > > > > > error.  could not instantiate
> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > arguments
> > > > '[
> > > > > > ]'
> > > > > >
> > > > > > Looking at the log file it gives the following:
> > > > > >
> > > > > > java.lang.RuntimeException: could not instantiate
> > > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> > arguments
> > > > '[
> > > > > > ]'
> > > > > > ...
> > > > > > Caused by: java.lang.reflect.InvocationTargetException
> > > > > > ...
> > > > > > Caused by: java.lang.NoClassDefFoundError:
> > > > com/google/common/collect/Maps
> > > > > > ...
> > > > > > Caused by: java.lang.ClassNotFoundException:
> > > > > com.google.common.collect.Maps
> > > > > >
> > > > > > What is confusing me is that LZO compression and decompression
> > works
> > > > fine
> > > > > > when I'm running a normal java based map reduce program so I feel
> > as
> > > > > though
> > > > > > the libraries have to be in the right place with the right
> settings
> > > for
> > > > > > java.library.path.  Otherwise how would normal java map-reduce
> > work?
> > > >  Is
> > > > > > there some other location I need to set JAVA_LIBRARY_PATH for pig
> > to
> > > > pick
> > > > > > it
> > > > > > up?  My understanding was that it would get this from
> > hadoop-env.sh.
> > > >  Are
> > > > > > the missing com.google.common.collect.Maps the real problem here?
> > > >  Thank
> > > > > > you
> > > > > > for any help!
> > > > > >
> > > > > > ~Ed
> > > > > >
> > > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
> > dvryaboy@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Ed,
> > > > > > > Elephant-bird only works with 0.6 at the moment. There's a
> branch
> > > for
> > > > > 0.7
> > > > > > > that I haven't tested:
> > http://github.com/hirohanin/elephant-bird/
> > > > > > > Try it, let me know if it works.
> > > > > > >
> > > > > > > -D
> > > > > > >
> > > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com>
> > wrote:
> > > > > > >
> > > > > > > > Hello,
> > > > > > > >
> > > > > > > > I have a small cluster up and running with LZO compressed
> files
> > > in
> > > > > it.
> > > > > > >  I'm
> > > > > > > > using the lzo compression libraries available at
> > > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> > > maintaining
> > > > > > this!)
> > > > > > > >
> > > > > > > > So far everything works fine when I write regular map-reduce
> > > jobs.
> > > >  I
> > > > > > can
> > > > > > > > read in lzo files and write out lzo files without any
> problem.
> > > > > > > >
> > > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
> > > files
> > > > > out
> > > > > > of
> > > > > > > > the box using the default LoadFunc (PigStorage).  However, I
> am
> > > > > > currently
> > > > > > > > testing a large LZO file (20GB) which I indexed using the
> > > > LzoIndexer
> > > > > > and
> > > > > > > > Pig
> > > > > > > > does not appear to be making use of the indexes.  The pig
> > scripts
> > > > > that
> > > > > > > I've
> > > > > > > > run so far only have 3 mappers when processing the 20GB file.
> >  My
> > > > > > > > understanding was that there should be 1 map for each block
> > > (256MB
> > > > > > > blocks)
> > > > > > > > so about 80 mappers when processing the 20GB lzo file.  Does
> > Pig
> > > > 0.7
> > > > > > > > support
> > > > > > > > indexed lzo files with the default load function?
> > > > > > > >
> > > > > > > > If not, I was looking at elephant-bird and noticed it is only
> > > > > > compatible
> > > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be
> the
> > > > > > > recommended
> > > > > > > > solution for processing index lzo files using Pig 0.7.
> > > > > > > >
> > > > > > > > Thank you for any assistance!
> > > > > > > >
> > > > > > > > ~Ed
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

try PIG_CLASSPATH

Oh and you might need to explicitly register them.. sorry, forgot. We just
have them on the hadoop classpath on the nodes themselves, so we don't have
to do that, but you might if you are starting fresh.

-D

On Wed, Sep 22, 2010 at 12:01 PM, pig <ha...@gmail.com> wrote:

> [foo]$ echo $CLASSPATH
> :/usr/lib/elephant-bird/lib/*
>
> This has been set for both user foo and hadoop but I still get the same
> error.  Is this the correct environment variable to be setting?
>
> Thank you!
>
> ~Ed
>
>
> On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > elephant-bird/lib/* (the * is important)
> >
> > On Wed, Sep 22, 2010 at 11:42 AM, pig <ha...@gmail.com> wrote:
> >
> > > Well I thought that would be a simple enough fix but no luck so far.
> > >
> > > I've added the elephant-bird/lib directory (which I made world readable
> > and
> > > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH as
> > > both
> > > the user running grunt and the hadoop user. (sort of a shotgun
> approach)
> > >
> > > I still get the error where it complains about nogplcompression and in
> > the
> > > log it has an error where it can't find com.google.common.collect.Maps
> > >
> > > Are these two separate problems or is it one problem that is causing
> two
> > > different errors?  Thank you for the help!
> > >
> > > ~Ed
> > >
> > > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > wrote:
> > >
> > > > You need the jars in elephant-bird's lib/ on your classpath to run
> > > > Elephant-Bird.
> > > >
> > > >
> > > > On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com> wrote:
> > > >
> > > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
> > branch
> > > a
> > > > > shot and have run into a problem when trying to run the following
> > test
> > > > pig
> > > > > script:
> > > > >
> > > > > REGISTER elephant-bird-1.0.jar
> > > > > A = LOAD '/user/foo/input' USING
> > > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > > > > B = LIMIT A 100;
> > > > > DUMP B;
> > > > >
> > > > > When I try to run this I get the following error:
> > > > >
> > > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> > java.library.path
> > > > >  ....
> > > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> > > > without
> > > > > native-hadoop
> > > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
> > internal
> > > > > error.  could not instantiate
> > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> arguments
> > > '[
> > > > > ]'
> > > > >
> > > > > Looking at the log file it gives the following:
> > > > >
> > > > > java.lang.RuntimeException: could not instantiate
> > > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with
> arguments
> > > '[
> > > > > ]'
> > > > > ...
> > > > > Caused by: java.lang.reflect.InvocationTargetException
> > > > > ...
> > > > > Caused by: java.lang.NoClassDefFoundError:
> > > com/google/common/collect/Maps
> > > > > ...
> > > > > Caused by: java.lang.ClassNotFoundException:
> > > > com.google.common.collect.Maps
> > > > >
> > > > > What is confusing me is that LZO compression and decompression
> works
> > > fine
> > > > > when I'm running a normal java based map reduce program so I feel
> as
> > > > though
> > > > > the libraries have to be in the right place with the right settings
> > for
> > > > > java.library.path.  Otherwise how would normal java map-reduce
> work?
> > >  Is
> > > > > there some other location I need to set JAVA_LIBRARY_PATH for pig
> to
> > > pick
> > > > > it
> > > > > up?  My understanding was that it would get this from
> hadoop-env.sh.
> > >  Are
> > > > > the missing com.google.common.collect.Maps the real problem here?
> > >  Thank
> > > > > you
> > > > > for any help!
> > > > >
> > > > > ~Ed
> > > > >
> > > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <
> dvryaboy@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Ed,
> > > > > > Elephant-bird only works with 0.6 at the moment. There's a branch
> > for
> > > > 0.7
> > > > > > that I haven't tested:
> http://github.com/hirohanin/elephant-bird/
> > > > > > Try it, let me know if it works.
> > > > > >
> > > > > > -D
> > > > > >
> > > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com>
> wrote:
> > > > > >
> > > > > > > Hello,
> > > > > > >
> > > > > > > I have a small cluster up and running with LZO compressed files
> > in
> > > > it.
> > > > > >  I'm
> > > > > > > using the lzo compression libraries available at
> > > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> > maintaining
> > > > > this!)
> > > > > > >
> > > > > > > So far everything works fine when I write regular map-reduce
> > jobs.
> > >  I
> > > > > can
> > > > > > > read in lzo files and write out lzo files without any problem.
> > > > > > >
> > > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
> > files
> > > > out
> > > > > of
> > > > > > > the box using the default LoadFunc (PigStorage).  However, I am
> > > > > currently
> > > > > > > testing a large LZO file (20GB) which I indexed using the
> > > LzoIndexer
> > > > > and
> > > > > > > Pig
> > > > > > > does not appear to be making use of the indexes.  The pig
> scripts
> > > > that
> > > > > > I've
> > > > > > > run so far only have 3 mappers when processing the 20GB file.
>  My
> > > > > > > understanding was that there should be 1 map for each block
> > (256MB
> > > > > > blocks)
> > > > > > > so about 80 mappers when processing the 20GB lzo file.  Does
> Pig
> > > 0.7
> > > > > > > support
> > > > > > > indexed lzo files with the default load function?
> > > > > > >
> > > > > > > If not, I was looking at elephant-bird and noticed it is only
> > > > > compatible
> > > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> > > > > > recommended
> > > > > > > solution for processing index lzo files using Pig 0.7.
> > > > > > >
> > > > > > > Thank you for any assistance!
> > > > > > >
> > > > > > > ~Ed
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by pig <ha...@gmail.com>.

[foo]$ echo $CLASSPATH
:/usr/lib/elephant-bird/lib/*

This has been set for both user foo and hadoop but I still get the same
error.  Is this the correct environment variable to be setting?

Thank you!

~Ed


On Wed, Sep 22, 2010 at 2:46 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> elephant-bird/lib/* (the * is important)
>
> On Wed, Sep 22, 2010 at 11:42 AM, pig <ha...@gmail.com> wrote:
>
> > Well I thought that would be a simple enough fix but no luck so far.
> >
> > I've added the elephant-bird/lib directory (which I made world readable
> and
> > executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH as
> > both
> > the user running grunt and the hadoop user. (sort of a shotgun approach)
> >
> > I still get the error where it complains about nogplcompression and in
> the
> > log it has an error where it can't find com.google.common.collect.Maps
> >
> > Are these two separate problems or is it one problem that is causing two
> > different errors?  Thank you for the help!
> >
> > ~Ed
> >
> > On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > wrote:
> >
> > > You need the jars in elephant-bird's lib/ on your classpath to run
> > > Elephant-Bird.
> > >
> > >
> > > On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com> wrote:
> > >
> > > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7
> branch
> > a
> > > > shot and have run into a problem when trying to run the following
> test
> > > pig
> > > > script:
> > > >
> > > > REGISTER elephant-bird-1.0.jar
> > > > A = LOAD '/user/foo/input' USING
> > > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > > > B = LIMIT A 100;
> > > > DUMP B;
> > > >
> > > > When I try to run this I get the following error:
> > > >
> > > > java.lang.UnsatisfiedLinkError: no gplcompression in
> java.library.path
> > > >  ....
> > > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> > > without
> > > > native-hadoop
> > > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected
> internal
> > > > error.  could not instantiate
> > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments
> > '[
> > > > ]'
> > > >
> > > > Looking at the log file it gives the following:
> > > >
> > > > java.lang.RuntimeException: could not instantiate
> > > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments
> > '[
> > > > ]'
> > > > ...
> > > > Caused by: java.lang.reflect.InvocationTargetException
> > > > ...
> > > > Caused by: java.lang.NoClassDefFoundError:
> > com/google/common/collect/Maps
> > > > ...
> > > > Caused by: java.lang.ClassNotFoundException:
> > > com.google.common.collect.Maps
> > > >
> > > > What is confusing me is that LZO compression and decompression works
> > fine
> > > > when I'm running a normal java based map reduce program so I feel as
> > > though
> > > > the libraries have to be in the right place with the right settings
> for
> > > > java.library.path.  Otherwise how would normal java map-reduce work?
> >  Is
> > > > there some other location I need to set JAVA_LIBRARY_PATH for pig to
> > pick
> > > > it
> > > > up?  My understanding was that it would get this from hadoop-env.sh.
> >  Are
> > > > the missing com.google.common.collect.Maps the real problem here?
> >  Thank
> > > > you
> > > > for any help!
> > > >
> > > > ~Ed
> > > >
> > > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Ed,
> > > > > Elephant-bird only works with 0.6 at the moment. There's a branch
> for
> > > 0.7
> > > > > that I haven't tested: http://github.com/hirohanin/elephant-bird/
> > > > > Try it, let me know if it works.
> > > > >
> > > > > -D
> > > > >
> > > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I have a small cluster up and running with LZO compressed files
> in
> > > it.
> > > > >  I'm
> > > > > > using the lzo compression libraries available at
> > > > > > http://github.com/kevinweil/hadoop-lzo (thank you for
> maintaining
> > > > this!)
> > > > > >
> > > > > > So far everything works fine when I write regular map-reduce
> jobs.
> >  I
> > > > can
> > > > > > read in lzo files and write out lzo files without any problem.
> > > > > >
> > > > > > I'm also using Pig 0.7 and it appears to be able to read LZO
> files
> > > out
> > > > of
> > > > > > the box using the default LoadFunc (PigStorage).  However, I am
> > > > currently
> > > > > > testing a large LZO file (20GB) which I indexed using the
> > LzoIndexer
> > > > and
> > > > > > Pig
> > > > > > does not appear to be making use of the indexes.  The pig scripts
> > > that
> > > > > I've
> > > > > > run so far only have 3 mappers when processing the 20GB file.  My
> > > > > > understanding was that there should be 1 map for each block
> (256MB
> > > > > blocks)
> > > > > > so about 80 mappers when processing the 20GB lzo file.  Does Pig
> > 0.7
> > > > > > support
> > > > > > indexed lzo files with the default load function?
> > > > > >
> > > > > > If not, I was looking at elephant-bird and noticed it is only
> > > > compatible
> > > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> > > > > recommended
> > > > > > solution for processing index lzo files using Pig 0.7.
> > > > > >
> > > > > > Thank you for any assistance!
> > > > > >
> > > > > > ~Ed
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

elephant-bird/lib/* (the * is important)

On Wed, Sep 22, 2010 at 11:42 AM, pig <ha...@gmail.com> wrote:

> Well I thought that would be a simple enough fix but no luck so far.
>
> I've added the elephant-bird/lib directory (which I made world readable and
> executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH as
> both
> the user running grunt and the hadoop user. (sort of a shotgun approach)
>
> I still get the error where it complains about nogplcompression and in the
> log it has an error where it can't find com.google.common.collect.Maps
>
> Are these two separate problems or is it one problem that is causing two
> different errors?  Thank you for the help!
>
> ~Ed
>
> On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > You need the jars in elephant-bird's lib/ on your classpath to run
> > Elephant-Bird.
> >
> >
> > On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com> wrote:
> >
> > > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7 branch
> a
> > > shot and have run into a problem when trying to run the following test
> > pig
> > > script:
> > >
> > > REGISTER elephant-bird-1.0.jar
> > > A = LOAD '/user/foo/input' USING
> > > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > > B = LIMIT A 100;
> > > DUMP B;
> > >
> > > When I try to run this I get the following error:
> > >
> > > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> > >  ....
> > > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> > without
> > > native-hadoop
> > > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected internal
> > > error.  could not instantiate
> > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments
> '[
> > > ]'
> > >
> > > Looking at the log file it gives the following:
> > >
> > > java.lang.RuntimeException: could not instantiate
> > > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments
> '[
> > > ]'
> > > ...
> > > Caused by: java.lang.reflect.InvocationTargetException
> > > ...
> > > Caused by: java.lang.NoClassDefFoundError:
> com/google/common/collect/Maps
> > > ...
> > > Caused by: java.lang.ClassNotFoundException:
> > com.google.common.collect.Maps
> > >
> > > What is confusing me is that LZO compression and decompression works
> fine
> > > when I'm running a normal java based map reduce program so I feel as
> > though
> > > the libraries have to be in the right place with the right settings for
> > > java.library.path.  Otherwise how would normal java map-reduce work?
>  Is
> > > there some other location I need to set JAVA_LIBRARY_PATH for pig to
> pick
> > > it
> > > up?  My understanding was that it would get this from hadoop-env.sh.
>  Are
> > > the missing com.google.common.collect.Maps the real problem here?
>  Thank
> > > you
> > > for any help!
> > >
> > > ~Ed
> > >
> > > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > > wrote:
> > >
> > > > Hi Ed,
> > > > Elephant-bird only works with 0.6 at the moment. There's a branch for
> > 0.7
> > > > that I haven't tested: http://github.com/hirohanin/elephant-bird/
> > > > Try it, let me know if it works.
> > > >
> > > > -D
> > > >
> > > > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com> wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I have a small cluster up and running with LZO compressed files in
> > it.
> > > >  I'm
> > > > > using the lzo compression libraries available at
> > > > > http://github.com/kevinweil/hadoop-lzo (thank you for maintaining
> > > this!)
> > > > >
> > > > > So far everything works fine when I write regular map-reduce jobs.
>  I
> > > can
> > > > > read in lzo files and write out lzo files without any problem.
> > > > >
> > > > > I'm also using Pig 0.7 and it appears to be able to read LZO files
> > out
> > > of
> > > > > the box using the default LoadFunc (PigStorage).  However, I am
> > > currently
> > > > > testing a large LZO file (20GB) which I indexed using the
> LzoIndexer
> > > and
> > > > > Pig
> > > > > does not appear to be making use of the indexes.  The pig scripts
> > that
> > > > I've
> > > > > run so far only have 3 mappers when processing the 20GB file.  My
> > > > > understanding was that there should be 1 map for each block (256MB
> > > > blocks)
> > > > > so about 80 mappers when processing the 20GB lzo file.  Does Pig
> 0.7
> > > > > support
> > > > > indexed lzo files with the default load function?
> > > > >
> > > > > If not, I was looking at elephant-bird and noticed it is only
> > > compatible
> > > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> > > > recommended
> > > > > solution for processing index lzo files using Pig 0.7.
> > > > >
> > > > > Thank you for any assistance!
> > > > >
> > > > > ~Ed
> > > > >
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by pig <ha...@gmail.com>.

Well I thought that would be a simple enough fix but no luck so far.

I've added the elephant-bird/lib directory (which I made world readable and
executable) to the CLASSPATH, JAVA_LIBRARY_PATH and HADOOP_CLASSPATH as both
the user running grunt and the hadoop user. (sort of a shotgun approach)

I still get the error where it complains about nogplcompression and in the
log it has an error where it can't find com.google.common.collect.Maps

Are these two separate problems or is it one problem that is causing two
different errors?  Thank you for the help!

~Ed

On Wed, Sep 22, 2010 at 1:57 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> You need the jars in elephant-bird's lib/ on your classpath to run
> Elephant-Bird.
>
>
> On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com> wrote:
>
> > Thank you for pointing out the 0.7 branch.   I'm giving the 0.7 branch a
> > shot and have run into a problem when trying to run the following test
> pig
> > script:
> >
> > REGISTER elephant-bird-1.0.jar
> > A = LOAD '/user/foo/input' USING
> > com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> > B = LIMIT A 100;
> > DUMP B;
> >
> > When I try to run this I get the following error:
> >
> > java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
> >  ....
> > ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo
> without
> > native-hadoop
> > ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected internal
> > error.  could not instantiate
> > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
> > ]'
> >
> > Looking at the log file it gives the following:
> >
> > java.lang.RuntimeException: could not instantiate
> > 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
> > ]'
> > ...
> > Caused by: java.lang.reflect.InvocationTargetException
> > ...
> > Caused by: java.lang.NoClassDefFoundError: com/google/common/collect/Maps
> > ...
> > Caused by: java.lang.ClassNotFoundException:
> com.google.common.collect.Maps
> >
> > What is confusing me is that LZO compression and decompression works fine
> > when I'm running a normal java based map reduce program so I feel as
> though
> > the libraries have to be in the right place with the right settings for
> > java.library.path.  Otherwise how would normal java map-reduce work?  Is
> > there some other location I need to set JAVA_LIBRARY_PATH for pig to pick
> > it
> > up?  My understanding was that it would get this from hadoop-env.sh.  Are
> > the missing com.google.common.collect.Maps the real problem here?  Thank
> > you
> > for any help!
> >
> > ~Ed
> >
> > On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dv...@gmail.com>
> > wrote:
> >
> > > Hi Ed,
> > > Elephant-bird only works with 0.6 at the moment. There's a branch for
> 0.7
> > > that I haven't tested: http://github.com/hirohanin/elephant-bird/
> > > Try it, let me know if it works.
> > >
> > > -D
> > >
> > > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com> wrote:
> > >
> > > > Hello,
> > > >
> > > > I have a small cluster up and running with LZO compressed files in
> it.
> > >  I'm
> > > > using the lzo compression libraries available at
> > > > http://github.com/kevinweil/hadoop-lzo (thank you for maintaining
> > this!)
> > > >
> > > > So far everything works fine when I write regular map-reduce jobs.  I
> > can
> > > > read in lzo files and write out lzo files without any problem.
> > > >
> > > > I'm also using Pig 0.7 and it appears to be able to read LZO files
> out
> > of
> > > > the box using the default LoadFunc (PigStorage).  However, I am
> > currently
> > > > testing a large LZO file (20GB) which I indexed using the LzoIndexer
> > and
> > > > Pig
> > > > does not appear to be making use of the indexes.  The pig scripts
> that
> > > I've
> > > > run so far only have 3 mappers when processing the 20GB file.  My
> > > > understanding was that there should be 1 map for each block (256MB
> > > blocks)
> > > > so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7
> > > > support
> > > > indexed lzo files with the default load function?
> > > >
> > > > If not, I was looking at elephant-bird and noticed it is only
> > compatible
> > > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> > > recommended
> > > > solution for processing index lzo files using Pig 0.7.
> > > >
> > > > Thank you for any assistance!
> > > >
> > > > ~Ed
> > > >
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

You need the jars in elephant-bird's lib/ on your classpath to run
Elephant-Bird.


On Wed, Sep 22, 2010 at 10:35 AM, pig <ha...@gmail.com> wrote:

> Thank you for pointing out the 0.7 branch.   I'm giving the 0.7 branch a
> shot and have run into a problem when trying to run the following test pig
> script:
>
> REGISTER elephant-bird-1.0.jar
> A = LOAD '/user/foo/input' USING
> com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
> B = LIMIT A 100;
> DUMP B;
>
> When I try to run this I get the following error:
>
> java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
>  ....
> ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without
> native-hadoop
> ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected internal
> error.  could not instantiate
> 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
> ]'
>
> Looking at the log file it gives the following:
>
> java.lang.RuntimeException: could not instantiate
> 'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
> ]'
> ...
> Caused by: java.lang.reflect.InvocationTargetException
> ...
> Caused by: java.lang.NoClassDefFoundError: com/google/common/collect/Maps
> ...
> Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Maps
>
> What is confusing me is that LZO compression and decompression works fine
> when I'm running a normal java based map reduce program so I feel as though
> the libraries have to be in the right place with the right settings for
> java.library.path.  Otherwise how would normal java map-reduce work?  Is
> there some other location I need to set JAVA_LIBRARY_PATH for pig to pick
> it
> up?  My understanding was that it would get this from hadoop-env.sh.  Are
> the missing com.google.common.collect.Maps the real problem here?  Thank
> you
> for any help!
>
> ~Ed
>
> On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dv...@gmail.com>
> wrote:
>
> > Hi Ed,
> > Elephant-bird only works with 0.6 at the moment. There's a branch for 0.7
> > that I haven't tested: http://github.com/hirohanin/elephant-bird/
> > Try it, let me know if it works.
> >
> > -D
> >
> > On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com> wrote:
> >
> > > Hello,
> > >
> > > I have a small cluster up and running with LZO compressed files in it.
> >  I'm
> > > using the lzo compression libraries available at
> > > http://github.com/kevinweil/hadoop-lzo (thank you for maintaining
> this!)
> > >
> > > So far everything works fine when I write regular map-reduce jobs.  I
> can
> > > read in lzo files and write out lzo files without any problem.
> > >
> > > I'm also using Pig 0.7 and it appears to be able to read LZO files out
> of
> > > the box using the default LoadFunc (PigStorage).  However, I am
> currently
> > > testing a large LZO file (20GB) which I indexed using the LzoIndexer
> and
> > > Pig
> > > does not appear to be making use of the indexes.  The pig scripts that
> > I've
> > > run so far only have 3 mappers when processing the 20GB file.  My
> > > understanding was that there should be 1 map for each block (256MB
> > blocks)
> > > so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7
> > > support
> > > indexed lzo files with the default load function?
> > >
> > > If not, I was looking at elephant-bird and noticed it is only
> compatible
> > > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> > recommended
> > > solution for processing index lzo files using Pig 0.7.
> > >
> > > Thank you for any assistance!
> > >
> > > ~Ed
> > >
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by pig <ha...@gmail.com>.

Thank you for pointing out the 0.7 branch.   I'm giving the 0.7 branch a
shot and have run into a problem when trying to run the following test pig
script:

REGISTER elephant-bird-1.0.jar
A = LOAD '/user/foo/input' USING
com.twitter.elephantbird.pig.load.LzoTokenizedLoader('\t');
B = LIMIT A 100;
DUMP B;

When I try to run this I get the following error:

java.lang.UnsatisfiedLinkError: no gplcompression in java.library.path
 ....
ERROR com.hadoop.compression.lzo.LzoCodec - Cannot load native-lzo without
native-hadoop
ERROR org.apache.pig.tools.grunt.Grunt - EROR 2999: Unexpected internal
error.  could not instantiate
'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
]'

Looking at the log file it gives the following:

java.lang.RuntimeException: could not instantiate
'com.twitter.elephantbird.pig.load.LzoTokenizedLoader' with arguments '[
]'
...
Caused by: java.lang.reflect.InvocationTargetException
...
Caused by: java.lang.NoClassDefFoundError: com/google/common/collect/Maps
...
Caused by: java.lang.ClassNotFoundException: com.google.common.collect.Maps

What is confusing me is that LZO compression and decompression works fine
when I'm running a normal java based map reduce program so I feel as though
the libraries have to be in the right place with the right settings for
java.library.path.  Otherwise how would normal java map-reduce work?  Is
there some other location I need to set JAVA_LIBRARY_PATH for pig to pick it
up?  My understanding was that it would get this from hadoop-env.sh.  Are
the missing com.google.common.collect.Maps the real problem here?  Thank you
for any help!

~Ed

On Tue, Sep 21, 2010 at 5:43 PM, Dmitriy Ryaboy <dv...@gmail.com> wrote:

> Hi Ed,
> Elephant-bird only works with 0.6 at the moment. There's a branch for 0.7
> that I haven't tested: http://github.com/hirohanin/elephant-bird/
> Try it, let me know if it works.
>
> -D
>
> On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com> wrote:
>
> > Hello,
> >
> > I have a small cluster up and running with LZO compressed files in it.
>  I'm
> > using the lzo compression libraries available at
> > http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!)
> >
> > So far everything works fine when I write regular map-reduce jobs.  I can
> > read in lzo files and write out lzo files without any problem.
> >
> > I'm also using Pig 0.7 and it appears to be able to read LZO files out of
> > the box using the default LoadFunc (PigStorage).  However, I am currently
> > testing a large LZO file (20GB) which I indexed using the LzoIndexer and
> > Pig
> > does not appear to be making use of the indexes.  The pig scripts that
> I've
> > run so far only have 3 mappers when processing the 20GB file.  My
> > understanding was that there should be 1 map for each block (256MB
> blocks)
> > so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7
> > support
> > indexed lzo files with the default load function?
> >
> > If not, I was looking at elephant-bird and noticed it is only compatible
> > with Pig 0.6 and not 0.7+  Is that accurate?  What would be the
> recommended
> > solution for processing index lzo files using Pig 0.7.
> >
> > Thank you for any assistance!
> >
> > ~Ed
> >
>

Re: Does Pig 0.7 support indexed LZO files? If not, does elephant-pig work with 0.7?

Posted by Dmitriy Ryaboy <dv...@gmail.com>.

Hi Ed,
Elephant-bird only works with 0.6 at the moment. There's a branch for 0.7
that I haven't tested: http://github.com/hirohanin/elephant-bird/
Try it, let me know if it works.

-D

On Tue, Sep 21, 2010 at 2:22 PM, pig <ha...@gmail.com> wrote:

> Hello,
>
> I have a small cluster up and running with LZO compressed files in it.  I'm
> using the lzo compression libraries available at
> http://github.com/kevinweil/hadoop-lzo (thank you for maintaining this!)
>
> So far everything works fine when I write regular map-reduce jobs.  I can
> read in lzo files and write out lzo files without any problem.
>
> I'm also using Pig 0.7 and it appears to be able to read LZO files out of
> the box using the default LoadFunc (PigStorage).  However, I am currently
> testing a large LZO file (20GB) which I indexed using the LzoIndexer and
> Pig
> does not appear to be making use of the indexes.  The pig scripts that I've
> run so far only have 3 mappers when processing the 20GB file.  My
> understanding was that there should be 1 map for each block (256MB blocks)
> so about 80 mappers when processing the 20GB lzo file.  Does Pig 0.7
> support
> indexed lzo files with the default load function?
>
> If not, I was looking at elephant-bird and noticed it is only compatible
> with Pig 0.6 and not 0.7+  Is that accurate?  What would be the recommended
> solution for processing index lzo files using Pig 0.7.
>
> Thank you for any assistance!
>
> ~Ed
>