You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Cheolsoo Park <pi...@gmail.com> on 2012/06/08 02:31:19 UTC

Running e2e RubyUDFs test in MR mode

Hello,

I checked out branch-0.10, and I am trying to run e2e RubyUDFs tests in MR
mode. But I am getting the following error:

java.lang.IllegalStateException: *Could not initialize interpreter (from
> file system or classpath) with
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> *
>         at
> org.apache.pig.scripting.ScriptEngine.getScriptAsStream(ScriptEngine.java:145)
>         at
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFromCache(JrubyScriptEngine.java:104)
>         at
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFunctions(JrubyScriptEngine.java:120)
>         at
> org.apache.pig.scripting.jruby.JrubyEvalFunc.initialize(JrubyEvalFunc.java:87)
>         at
> org.apache.pig.scripting.jruby.JrubyEvalFunc.exec(JrubyEvalFunc.java:103)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
>         at
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:328)


Looking at the source code (ScriptEngine.java), I found
that scriptingudfs.rb should be found via classpath:

        if (file.exists()) {
>             try {
>                 is = new FileInputStream(file);
>             } catch (FileNotFoundException e) {
>                 throw new IllegalStateException("could not find existing
> file "+scriptPath, e);
>             }
>         } else {
>             if (file.isAbsolute()) {
>                 *is = ScriptEngine.class.getResourceAsStream(scriptPath);*
>             } else {
>                 is = ScriptEngine.class.getResourceAsStream("/" +
> scriptPath);
>             }
>         }


Now I looked at the Job jar generated by Pig and found that
scriptingudfs.rb indeed exists in that jar:

 cheolsoo@localhost:~/workspace/pig-cheolsoo $jar tvf
> Job9203441412304345930.jar | grep scriptingudfs.rb
>   2491 Thu Jun 07 14:42:44 PDT 2012 *
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/scriptingudfs.rb*


Since scriptingudfs.rb is inside the Job jar, I imagine that
getResourceAsStream() should be able to find it, but apparently it doesn't.

I am wondering if anyone was able to run these test in MR mode and could
provide some pointers to me. Any help would be appreciated!

Thanks,
Cheolsoo

p.s. The test works fine in local mode, which is not surprising
since scriptingudfs.rb would be found via file system. I also see a similar
issue with e2e Jython tests where Jython scripts are not found with
following error:

2012-06-05 22:44:19,491 [main] INFO
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> - Failed!
> 2012-06-05 22:44:19,513 [main] ERROR
> org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate
> exception from backed error: java.io.IOException: Deserialization error:
> could not instantiate 'org.apache.pig.scripting.jython.JythonFunction' with
> arguments
> '[/home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/python/scriptingudf.py,
> square]'
>

Re: Running e2e RubyUDFs test in MR mode

Posted by Subir S <su...@gmail.com>.
nice !

On Sun, Jun 10, 2012 at 12:51 AM, Cheolsoo Park <pi...@gmail.com>wrote:

> Hi Subir,
>
> Thanks for asking. In fact, I found out what's the issue and filed a jira:
> https://issues.apache.org/jira/browse/PIG-2745. Please find details from
> the jira.
>
> Cheolsoo
>
> On Sat, Jun 9, 2012 at 5:42 AM, Subir S <su...@gmail.com> wrote:
>
> >  can you pls share a snippet on how you are using these udfs?
> >
> > On Fri, Jun 8, 2012 at 6:01 AM, Cheolsoo Park <pi...@gmail.com>
> > wrote:
> >
> > > Hello,
> > >
> > > I checked out branch-0.10, and I am trying to run e2e RubyUDFs tests in
> > MR
> > > mode. But I am getting the following error:
> > >
> > > java.lang.IllegalStateException: *Could not initialize interpreter
> (from
> > > > file system or classpath) with
> > > >
> > >
> >
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> > > > *
> > > >         at
> > > >
> > >
> >
> org.apache.pig.scripting.ScriptEngine.getScriptAsStream(ScriptEngine.java:145)
> > > >         at
> > > >
> > >
> >
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFromCache(JrubyScriptEngine.java:104)
> > > >         at
> > > >
> > >
> >
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFunctions(JrubyScriptEngine.java:120)
> > > >         at
> > > >
> > >
> >
> org.apache.pig.scripting.jruby.JrubyEvalFunc.initialize(JrubyEvalFunc.java:87)
> > > >         at
> > > >
> > org.apache.pig.scripting.jruby.JrubyEvalFunc.exec(JrubyEvalFunc.java:103)
> > > >         at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
> > > >         at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
> > > >         at
> > > >
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:328)
> > >
> > >
> > > Looking at the source code (ScriptEngine.java), I found
> > > that scriptingudfs.rb should be found via classpath:
> > >
> > >        if (file.exists()) {
> > > >             try {
> > > >                 is = new FileInputStream(file);
> > > >             } catch (FileNotFoundException e) {
> > > >                 throw new IllegalStateException("could not find
> > existing
> > > > file "+scriptPath, e);
> > > >             }
> > > >         } else {
> > > >             if (file.isAbsolute()) {
> > > >                 *is =
> > > ScriptEngine.class.getResourceAsStream(scriptPath);*
> > > >             } else {
> > > >                 is = ScriptEngine.class.getResourceAsStream("/" +
> > > > scriptPath);
> > > >             }
> > > >         }
> > >
> > >
> > > Now I looked at the Job jar generated by Pig and found that
> > > scriptingudfs.rb indeed exists in that jar:
> > >
> > >  cheolsoo@localhost:~/workspace/pig-cheolsoo $jar tvf
> > > > Job9203441412304345930.jar | grep scriptingudfs.rb
> > > >   2491 Thu Jun 07 14:42:44 PDT 2012 *
> > > > /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/scriptingudfs.rb*
> > >
> > >
> > > Since scriptingudfs.rb is inside the Job jar, I imagine that
> > > getResourceAsStream() should be able to find it, but apparently it
> > doesn't.
> > >
> > > I am wondering if anyone was able to run these test in MR mode and
> could
> > > provide some pointers to me. Any help would be appreciated!
> > >
> > > Thanks,
> > > Cheolsoo
> > >
> > > p.s. The test works fine in local mode, which is not surprising
> > > since scriptingudfs.rb would be found via file system. I also see a
> > similar
> > > issue with e2e Jython tests where Jython scripts are not found with
> > > following error:
> > >
> > > 2012-06-05 22:44:19,491 [main] INFO
> > > >
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > > - Failed!
> > > > 2012-06-05 22:44:19,513 [main] ERROR
> > > > org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to
> recreate
> > > > exception from backed error: java.io.IOException: Deserialization
> > error:
> > > > could not instantiate
> 'org.apache.pig.scripting.jython.JythonFunction'
> > > with
> > > > arguments
> > > >
> > >
> >
> '[/home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/python/scriptingudf.py,
> > > > square]'
> > > >
> > >
> >
>

Re: Running e2e RubyUDFs test in MR mode

Posted by Cheolsoo Park <pi...@gmail.com>.
Hi Subir,

Thanks for asking. In fact, I found out what's the issue and filed a jira:
https://issues.apache.org/jira/browse/PIG-2745. Please find details from
the jira.

Cheolsoo

On Sat, Jun 9, 2012 at 5:42 AM, Subir S <su...@gmail.com> wrote:

>  can you pls share a snippet on how you are using these udfs?
>
> On Fri, Jun 8, 2012 at 6:01 AM, Cheolsoo Park <pi...@gmail.com>
> wrote:
>
> > Hello,
> >
> > I checked out branch-0.10, and I am trying to run e2e RubyUDFs tests in
> MR
> > mode. But I am getting the following error:
> >
> > java.lang.IllegalStateException: *Could not initialize interpreter (from
> > > file system or classpath) with
> > >
> >
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> > > *
> > >         at
> > >
> >
> org.apache.pig.scripting.ScriptEngine.getScriptAsStream(ScriptEngine.java:145)
> > >         at
> > >
> >
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFromCache(JrubyScriptEngine.java:104)
> > >         at
> > >
> >
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFunctions(JrubyScriptEngine.java:120)
> > >         at
> > >
> >
> org.apache.pig.scripting.jruby.JrubyEvalFunc.initialize(JrubyEvalFunc.java:87)
> > >         at
> > >
> org.apache.pig.scripting.jruby.JrubyEvalFunc.exec(JrubyEvalFunc.java:103)
> > >         at
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
> > >         at
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
> > >         at
> > >
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:328)
> >
> >
> > Looking at the source code (ScriptEngine.java), I found
> > that scriptingudfs.rb should be found via classpath:
> >
> >        if (file.exists()) {
> > >             try {
> > >                 is = new FileInputStream(file);
> > >             } catch (FileNotFoundException e) {
> > >                 throw new IllegalStateException("could not find
> existing
> > > file "+scriptPath, e);
> > >             }
> > >         } else {
> > >             if (file.isAbsolute()) {
> > >                 *is =
> > ScriptEngine.class.getResourceAsStream(scriptPath);*
> > >             } else {
> > >                 is = ScriptEngine.class.getResourceAsStream("/" +
> > > scriptPath);
> > >             }
> > >         }
> >
> >
> > Now I looked at the Job jar generated by Pig and found that
> > scriptingudfs.rb indeed exists in that jar:
> >
> >  cheolsoo@localhost:~/workspace/pig-cheolsoo $jar tvf
> > > Job9203441412304345930.jar | grep scriptingudfs.rb
> > >   2491 Thu Jun 07 14:42:44 PDT 2012 *
> > > /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/scriptingudfs.rb*
> >
> >
> > Since scriptingudfs.rb is inside the Job jar, I imagine that
> > getResourceAsStream() should be able to find it, but apparently it
> doesn't.
> >
> > I am wondering if anyone was able to run these test in MR mode and could
> > provide some pointers to me. Any help would be appreciated!
> >
> > Thanks,
> > Cheolsoo
> >
> > p.s. The test works fine in local mode, which is not surprising
> > since scriptingudfs.rb would be found via file system. I also see a
> similar
> > issue with e2e Jython tests where Jython scripts are not found with
> > following error:
> >
> > 2012-06-05 22:44:19,491 [main] INFO
> > >
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > > - Failed!
> > > 2012-06-05 22:44:19,513 [main] ERROR
> > > org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate
> > > exception from backed error: java.io.IOException: Deserialization
> error:
> > > could not instantiate 'org.apache.pig.scripting.jython.JythonFunction'
> > with
> > > arguments
> > >
> >
> '[/home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/python/scriptingudf.py,
> > > square]'
> > >
> >
>

Re: Running e2e RubyUDFs test in MR mode

Posted by Subir S <su...@gmail.com>.
 can you pls share a snippet on how you are using these udfs?

On Fri, Jun 8, 2012 at 6:01 AM, Cheolsoo Park <pi...@gmail.com> wrote:

> Hello,
>
> I checked out branch-0.10, and I am trying to run e2e RubyUDFs tests in MR
> mode. But I am getting the following error:
>
> java.lang.IllegalStateException: *Could not initialize interpreter (from
> > file system or classpath) with
> >
> /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> > *
> >         at
> >
> org.apache.pig.scripting.ScriptEngine.getScriptAsStream(ScriptEngine.java:145)
> >         at
> >
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFromCache(JrubyScriptEngine.java:104)
> >         at
> >
> org.apache.pig.scripting.jruby.JrubyScriptEngine$RubyFunctions.getFunctions(JrubyScriptEngine.java:120)
> >         at
> >
> org.apache.pig.scripting.jruby.JrubyEvalFunc.initialize(JrubyEvalFunc.java:87)
> >         at
> > org.apache.pig.scripting.jruby.JrubyEvalFunc.exec(JrubyEvalFunc.java:103)
> >         at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:216)
> >         at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:263)
> >         at
> >
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.getNext(PhysicalOperator.java:328)
>
>
> Looking at the source code (ScriptEngine.java), I found
> that scriptingudfs.rb should be found via classpath:
>
>        if (file.exists()) {
> >             try {
> >                 is = new FileInputStream(file);
> >             } catch (FileNotFoundException e) {
> >                 throw new IllegalStateException("could not find existing
> > file "+scriptPath, e);
> >             }
> >         } else {
> >             if (file.isAbsolute()) {
> >                 *is =
> ScriptEngine.class.getResourceAsStream(scriptPath);*
> >             } else {
> >                 is = ScriptEngine.class.getResourceAsStream("/" +
> > scriptPath);
> >             }
> >         }
>
>
> Now I looked at the Job jar generated by Pig and found that
> scriptingudfs.rb indeed exists in that jar:
>
>  cheolsoo@localhost:~/workspace/pig-cheolsoo $jar tvf
> > Job9203441412304345930.jar | grep scriptingudfs.rb
> >   2491 Thu Jun 07 14:42:44 PDT 2012 *
> > /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/scriptingudfs.rb*
>
>
> Since scriptingudfs.rb is inside the Job jar, I imagine that
> getResourceAsStream() should be able to find it, but apparently it doesn't.
>
> I am wondering if anyone was able to run these test in MR mode and could
> provide some pointers to me. Any help would be appreciated!
>
> Thanks,
> Cheolsoo
>
> p.s. The test works fine in local mode, which is not surprising
> since scriptingudfs.rb would be found via file system. I also see a similar
> issue with e2e Jython tests where Jython scripts are not found with
> following error:
>
> 2012-06-05 22:44:19,491 [main] INFO
> >
>  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
> > - Failed!
> > 2012-06-05 22:44:19,513 [main] ERROR
> > org.apache.pig.tools.grunt.GruntParser - ERROR 2997: Unable to recreate
> > exception from backed error: java.io.IOException: Deserialization error:
> > could not instantiate 'org.apache.pig.scripting.jython.JythonFunction'
> with
> > arguments
> >
> '[/home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/python/scriptingudf.py,
> > square]'
> >
>