You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Cheolsoo Park (JIRA)" <ji...@apache.org> on 2012/06/09 21:19:43 UTC

[jira] [Created] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Cheolsoo Park created PIG-2745:
----------------------------------

             Summary: Pig e2e test RubyUDFs fails in MR mode when running from tarball
                 Key: PIG-2745
                 URL: https://issues.apache.org/jira/browse/PIG-2745
             Project: Pig
          Issue Type: Bug
    Affects Versions: 0.10.1
            Reporter: Cheolsoo Park


To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.

{code}
ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
{code}

The test fails with the following error:

{code}
java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

Now look at the job jar generated by Pig, and search for "scriptingudfs.rb" that the error complains about.

To save the job jar in /tmp, I had to comment out the following line in JobComtrolCompiler.java: 

{code}
submitJarFile.deleteOnExit();
{code}

It can be seen that the absolute path of the script is stored in the job jar as follows:

{code}
[cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
  2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" seems supposed to be able to be found from the jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x" in the jar. Since "scriptingudfs.rb" is stored as the absolute path with the leading "/", it ends up being not found by getResourceAsStream(scriptPath).

{code}
File file = new File(scriptPath);
if (file.exists()) {
    try {
        is = new FileInputStream(file);
    } catch (FileNotFoundException e) {
        throw new IllegalStateException("could not find existing file "+scriptPath, e);
    }
} else {
    if (file.isAbsolute()) {
        is = ScriptEngine.class.getResourceAsStream(scriptPath);
    } else {
        is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
    }
}
{code}

In fact, the test appears to pass if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" exists in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb), so it is found in file system.

The fix in UNIX seems straightforward. When registering UDF scripts, we can simply remove the leading "/". For example,

{code:title=src/org/apache/pig/PigServer.java}
-        pigContext.addScriptFile(f.getPath());
+        String key = f.isAbsolute() ? f.getPath().substring(1) : f.getPath();
+        pigContext.addScriptFile(key, f.getPath());
{code}

This results in that the UDF scripts are stored without the leading "/" in the job jar as follows:

{code}
[cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
  2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

But this won't work with Windows and S3 as their root dir is not "/".

Alternatively, we could store the UDF scripts with the file name instead of the full absolute path in the job jar. But this will disallow more than one UDF scripts with the same name but in different paths to be registered.

I am wondering if anyone has a better suggestion. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2745:
-------------------------------

    Attachment: PIG-2745.patch
    
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2745.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Now look at the job jar generated by Pig, and search for "scriptingudfs.rb" that the error complains about.
> To save the job jar in /tmp, I had to comment out the following line in JobComtrolCompiler.java: 
> {code}
> submitJarFile.deleteOnExit();
> {code}
> It can be seen that the absolute path of the script is stored in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" seems supposed to be able to be found from the jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x" in the jar. Since "scriptingudfs.rb" is stored as the absolute path with the leading "/", it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test appears to pass if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" exists in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb), so it is found in file system.
> The fix in UNIX seems straightforward. When registering UDF scripts, we can simply remove the leading "/". For example,
> {code:title=src/org/apache/pig/PigServer.java}
> -        pigContext.addScriptFile(f.getPath());
> +        String key = f.isAbsolute() ? f.getPath().substring(1) : f.getPath();
> +        pigContext.addScriptFile(key, f.getPath());
> {code}
> This results in that the UDF scripts are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> But this won't work with Windows and S3 as their root dir is not "/".
> Alternatively, we could store the UDF scripts with the file name instead of the full absolute path in the job jar. But this will disallow more than one UDF scripts with the same name but in different paths to be registered.
> I am wondering if anyone has a better suggestion. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2745:
----------------------------

    Attachment: enable_scripting_tests_23.patch
    
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java, enable_scripting_tests_23.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13396965#comment-13396965 ] 

Daniel Dai commented on PIG-2745:
---------------------------------

Hi, Cheolsoo,
You are right. This issue is fixed as a byproduct of PIG-2623, which convert the relative path to absolute path. All the Scripting tests pass for hadoop 23 now. I will enable those tests for 23.

However, there is one another hole left. If we import another python module, Pig cannot pack/refer the path of dependent python module correctly. Here is one example:

udf.py:
from base import square

@outputSchemaFunction("squaresquareSchema")
def squaresquare(num):
    if num == None:
        return None
    return (square(num)*square(num))

@schemaFunction("squaresquareSchema")
def squaresquareSchema(input):
    return input

base.py
def square(num):
    if num == None:
        return None
    return ((num)*(num))

Pig script:
register 'udf.py' using jython as myfuncs;

a = load '1.txt' as (a0:int);
b = foreach a generate myfuncs.squaresquare(a0);
dump b;

Pig incorrectly pack the base.py as /base.py in job.jar, and fail to refer it in backend. It happens in both 20 and 23.
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java, enable_scripting_tests_23.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2745:
----------------------------

    Attachment: enable_scripting_tests_23.patch
    
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java, enable_scripting_tests_23.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2745:
-------------------------------

    Description: 
To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.

{code}
ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
{code}

The test fails with the following error:

{code}
java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:

{code}
[cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
  2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).

{code}
File file = new File(scriptPath);
if (file.exists()) {
    try {
        is = new FileInputStream(file);
    } catch (FileNotFoundException e) {
        throw new IllegalStateException("could not find existing file "+scriptPath, e);
    }
} else {
    if (file.isAbsolute()) {
        is = ScriptEngine.class.getResourceAsStream(scriptPath);
    } else {
        is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
    }
}
{code}

In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).

The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:

{code}
[cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
  2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

Thanks!

  was:
To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.

{code}
ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
{code}

The test fails with the following error:

{code}
java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

Now look at the job jar generated by Pig, and search for "scriptingudfs.rb" that the error complains about.

To save the job jar in /tmp, I had to comment out the following line in JobComtrolCompiler.java: 

{code}
submitJarFile.deleteOnExit();
{code}

It can be seen that the absolute path of the script is stored in the job jar as follows:

{code}
[cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
  2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" seems supposed to be able to be found from the jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x" in the jar. Since "scriptingudfs.rb" is stored as the absolute path with the leading "/", it ends up being not found by getResourceAsStream(scriptPath).

{code}
File file = new File(scriptPath);
if (file.exists()) {
    try {
        is = new FileInputStream(file);
    } catch (FileNotFoundException e) {
        throw new IllegalStateException("could not find existing file "+scriptPath, e);
    }
} else {
    if (file.isAbsolute()) {
        is = ScriptEngine.class.getResourceAsStream(scriptPath);
    } else {
        is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
    }
}
{code}

In fact, the test appears to pass if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" exists in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb), so it is found in file system.

The fix in UNIX seems straightforward. When registering UDF scripts, we can simply remove the leading "/". For example,

{code:title=src/org/apache/pig/PigServer.java}
-        pigContext.addScriptFile(f.getPath());
+        String key = f.isAbsolute() ? f.getPath().substring(1) : f.getPath();
+        pigContext.addScriptFile(key, f.getPath());
{code}

This results in that the UDF scripts are stored without the leading "/" in the job jar as follows:

{code}
[cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
  2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
{code}

But this won't work with Windows and S3 as their root dir is not "/".

Alternatively, we could store the UDF scripts with the file name instead of the full absolute path in the job jar. But this will disallow more than one UDF scripts with the same name but in different paths to be registered.

I am wondering if anyone has a better suggestion. Thanks!

    
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2745.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Alan Gates (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Alan Gates updated PIG-2745:
----------------------------

    Status: Patch Available  (was: Open)
    
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2745.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2745:
----------------------------

    Attachment:     (was: enable_scripting_tests_23.patch)
    
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java, enable_scripting_tests_23.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393644#comment-13393644 ] 

Daniel Dai commented on PIG-2745:
---------------------------------

Looks good. I also attach a java code to demonstrate the problem. The patch fix the issue in Ruby. The issue for Python relative path is still there.
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Cheolsoo Park updated PIG-2745:
-------------------------------

    Attachment: PIG-2745-2.patch

I updated the patch to handle not only a leading "/" but also "./".

In fact, it is not necessary to worry about "./" since FileLocalizer.fetchFile() already converts relative paths to absolute paths; nevertheless, it seems like a good idea to make this method more robust anyway. 

In addition, I replaced substring(1) with
{code}replaceFirst("^\\./|^/", ""){code} because the latter seems more intuitive.
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2745-2.patch, PIG-2745.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13296047#comment-13296047 ] 

Rohini Palaniswamy commented on PIG-2745:
-----------------------------------------

+1. Tested this patch with relative path and absolute path for 20.205 and 23. Works fine. 

Daniel,
   Can you include this in 0.10 also. Without this scripting udfs do not work in 23 for both relative and absolute path. e2e tests currently marked ignored for MAPREDUCE-3700 also will have to be enabled again. 
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2745-2.patch, PIG-2745.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Rohini Palaniswamy (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13397052#comment-13397052 ] 

Rohini Palaniswamy commented on PIG-2745:
-----------------------------------------

Cheolsoo,
   I am looking into the issue Daniel mentioned. Should have a fix soon. 
   
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java, enable_scripting_tests_23.patch
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2745:
----------------------------

       Resolution: Fixed
    Fix Version/s: 0.10.1
                   0.11
         Assignee: Cheolsoo Park
     Hadoop Flags: Reviewed
           Status: Resolved  (was: Patch Available)

Patch committed to 0.10/trunk. Thanks Cheolsoo!
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13292415#comment-13292415 ] 

Cheolsoo Park commented on PIG-2745:
------------------------------------

I also see the same issue with e2e Scripting tests where Jython UDF scripts are not found in classpath. Applying the change that I described let those test pass as well.
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Now look at the job jar generated by Pig, and search for "scriptingudfs.rb" that the error complains about.
> To save the job jar in /tmp, I had to comment out the following line in JobComtrolCompiler.java: 
> {code}
> submitJarFile.deleteOnExit();
> {code}
> It can be seen that the absolute path of the script is stored in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" seems supposed to be able to be found from the jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x" in the jar. Since "scriptingudfs.rb" is stored as the absolute path with the leading "/", it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test appears to pass if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" exists in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb), so it is found in file system.
> The fix in UNIX seems straightforward. When registering UDF scripts, we can simply remove the leading "/". For example,
> {code:title=src/org/apache/pig/PigServer.java}
> -        pigContext.addScriptFile(f.getPath());
> +        String key = f.isAbsolute() ? f.getPath().substring(1) : f.getPath();
> +        pigContext.addScriptFile(key, f.getPath());
> {code}
> This results in that the UDF scripts are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> But this won't work with Windows and S3 as their root dir is not "/".
> Alternatively, we could store the UDF scripts with the file name instead of the full absolute path in the job jar. But this will disallow more than one UDF scripts with the same name but in different paths to be registered.
> I am wondering if anyone has a better suggestion. Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Cheolsoo Park (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13393660#comment-13393660 ] 

Cheolsoo Park commented on PIG-2745:
------------------------------------

Hi Daniel, thanks for submitting my patch!

I am wondering why you think that the issue with relative paths for Python still exists. In my YARN cluster, the Scripting_* tests (excluded due to MAPREDUCE-3700) all pass. (Technically, I am using Hadoop-2.0.0, but that shouldn't make a difference.) I can also manually verify that it works in Grunt shell.

My fix shouldn't be Ruby-specific since the problem is with PigServer stuffing any UDF scripts into the job jar.

Looking at your test code, one thing that I haven't thought about is "../" although that shouldn't be an issue now as in the registerCode() method, relative paths are always converted to absolute paths by FileLocalizer.fetchFile(). Nevertheless, handling "../" as well might be a good idea to make that method more robust.

Thanks!
                
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.11, 0.10.1
>
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (PIG-2745) Pig e2e test RubyUDFs fails in MR mode when running from tarball

Posted by "Daniel Dai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/PIG-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Daniel Dai updated PIG-2745:
----------------------------

    Attachment: Test001.java
    
> Pig e2e test RubyUDFs fails in MR mode when running from tarball
> ----------------------------------------------------------------
>
>                 Key: PIG-2745
>                 URL: https://issues.apache.org/jira/browse/PIG-2745
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.1
>            Reporter: Cheolsoo Park
>         Attachments: PIG-2745-2.patch, PIG-2745.patch, Test001.java
>
>
> To reproduce the issue, please run the e2e test "RubyUDFs_1" in MR mode from the tarball (not from installed Pig - please see why below). Either pseudo-distributed-mode or full-mode Hadoop can be used.
> {code}
> ant -Dhadoopversion=23 -Dharness.old.pig=`pwd` -Dharness.cluster.conf=/etc/hadoop/conf/ -Dharness.cluster.bin=/usr/lib/hadoop/bin/hadoop test-e2e -Dtests.to.run="-t RubyUDFs_1"
> {code}
> The test fails with the following error:
> {code}
> java.lang.IllegalStateException: Could not initialize interpreter (from file system or classpath) with /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at the job jar generated by Pig, "scriptingudfs.rb" can be found as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf bad.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 /home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Looking at getScriptAsStream() method in ScriptEngine.java, "scriptingudfs.rb" is supposed to be read from the job jar, but it is not. The reason is because getResourceAsStream("/x") looks for "x" (without the leading "/") not "/x". Since "scriptingudfs.rb" is stored with it absolute path, it ends up being not found by getResourceAsStream(scriptPath).
> {code}
> File file = new File(scriptPath);
> if (file.exists()) {
>     try {
>         is = new FileInputStream(file);
>     } catch (FileNotFoundException e) {
>         throw new IllegalStateException("could not find existing file "+scriptPath, e);
>     }
> } else {
>     if (file.isAbsolute()) {
>         is = ScriptEngine.class.getResourceAsStream(scriptPath);
>     } else {
>         is = ScriptEngine.class.getResourceAsStream("/" + scriptPath);
>     }
> }
> {code}
> In fact, the test passes if you run in local mode or from installed Pig. The reason is because "scriptingudfs.rb" is found in local file system (e.g /usr/share/pig/test/e2e/pig/udfs/ruby/scriptingudfs.rb).
> The fix seems straightforward. Attached is the patch that removes the leading "/" when registering UDF scripts so that they are stored without the leading "/" in the job jar as follows:
> {code}
> [cheolsoo@c1405 pig-cheolsoo]$ jar tvf good.jar | grep scriptingudfs.rb
>   2491 Fri Jun 08 15:52:08 PDT 2012 home/cheolsoo/pig-0.10/test/e2e/pig/testdist/libexec/ruby/scriptingudfs.rb
> {code}
> Thanks!

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira