You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Julien Le Dem (JIRA)" <ji...@apache.org> on 2012/06/06 00:26:23 UTC

[jira] [Commented] (PIG-2665) Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts with embedded Pig Latin

    [ https://issues.apache.org/jira/browse/PIG-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289795#comment-13289795 ] 

Julien Le Dem commented on PIG-2665:
------------------------------------

Looks good to me
+1
                
> Bundled Jython jar in Pig 0.10.0-RC breaks module import in Python scripts with embedded Pig Latin
> --------------------------------------------------------------------------------------------------
>
>                 Key: PIG-2665
>                 URL: https://issues.apache.org/jira/browse/PIG-2665
>             Project: Pig
>          Issue Type: Bug
>    Affects Versions: 0.10.0
>         Environment: Verified bug on RHEL6 and on Ubuntu 11.10 with Sun JDK 1.6, and both Jython 2.5.0 (shipped with the Pig 0.10.0 RC package) and Jython 2.5.2.
>            Reporter: Michael Noll
>            Assignee: Daniel Dai
>             Fix For: 0.11
>
>         Attachments: PIG-2665-1.patch
>
>
> Using Pig 0.9.0 I was running into PIG-1824 when using import statements (e.g. {{import os}}) in a Python script with embedded Pig Latin.  Dmitriy Ryaboy pointed me to the new Pig 0.10 release candidate (http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz) so that I could test whether my issue was solved by the new Pig version.  During testing I run into the error described below.
> *Summary (TL;DR)*
> * Even a minimal Python script with embedded Pig Latin will throw an error if there is a single import statement in the Python code.
> * The fix is to replace the bundled {{lib/jython.jar}} with a standalone version of the same jar.
> *Error message: "ERROR 1121: Python Error (ImportError: No module named <yourmodule>)"*
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py 
> 2012-04-24 11:20:44,224 [main] INFO  org.apache.pig.Main - Apache Pig version 0.10.0 (r1328203) compiled Apr 19 2012, 22:54:12
> [...snip...]
> *sys-package-mgr*: can't create package cache dir, '/path/to/pig-0.10.0-RC1/lib/cachedir/packages'
> 2012-04-24 11:20:44,816 [main] INFO  org.apache.pig.scripting.jython.JythonScriptEngine - created tmp python.cachedir=/tmp/pig_jython_4081589571886870123
> 2012-04-24 11:20:45,033 [main] ERROR org.apache.pig.Main - ERROR 1121: Python Error. Traceback (most recent call last):
>   File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
>     import os
> ImportError: No module named os
> {code}
> In the Pig log file:
> {code}
> Error before Pig is launched
> ----------------------------
> ERROR 1121: Python Error. Traceback (most recent call last):
>   File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
>     import os
> ImportError: No module named os
> org.apache.pig.backend.executionengine.ExecException: ERROR 1121: Python Error. Traceback (most recent call last):
>   File "/home/mnoll/pig10rc/rctest.py", line 5, in <module>
>     import os
> ImportError: No module named os
>         at org.apache.pig.scripting.jython.JythonScriptEngine$Interpreter.execfile(JythonScriptEngine.java:210)
>         at org.apache.pig.scripting.jython.JythonScriptEngine.load(JythonScriptEngine.java:384)
>         at org.apache.pig.scripting.jython.JythonScriptEngine.main(JythonScriptEngine.java:368)
>         at org.apache.pig.scripting.ScriptEngine.run(ScriptEngine.java:275)
>         at org.apache.pig.Main.runEmbeddedScript(Main.java:929)
>         at org.apache.pig.Main.run(Main.java:510)
>         at org.apache.pig.Main.main(Main.java:111)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:597)
>         at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> Caused by: Traceback (most recent call last):
> {code}
> *How to reproduce*
> Create a simple Python script that uses embedded Pig Latin AND that imports Python standard modules (any import statement will work):
> {code}
> #!/usr/bin/python 
> from org.apache.pig.scripting import Pig 
> # this import statement will trigger the error;
> # remove it and everything will work fine
> import os
> if __name__ == "__main__":
>     pig_script = """
>         set job.name 'Pig 0.10.0-RC1 Python test';
>     """
>     P = Pig.compile(pig_script)
>     bound = P.bind()
>     result = bound.runSingle()
>     if result.isSuccessful() :
>         print "Pig job succeeded"
>     else:
>         raise "Pig job failed"
> {code}
> Then proceed as follows.
> {code}
> #
> # Install the Pig 0.10.0 release candidate [1].
> #
> # run the Python test script
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py 
> #
> # see section above for error message
> #
> {code}
> *Test Environment*
> Apart from the "Environment" JIRA field please note that none of the TaskTracker boxes in my test cluster has Pig or Jython installed.  Pig with Jython is only available on a gateway box from which analysis jobs are run.
> *Bug description*
> During my investigation I discovered that the {{jython.jar}} that is shipped with the 0.10.0 RC package is NOT a standalone version of Jython.  I compared (diffed) the unpacked contents of the existing jython.jar with a standalone jar for Jython 2.5.0, and noticed that the main difference is that the standalone jar comes with a {{Lib/}} directory containing the various Python standard modules:
> {code}
> $ diff -r jython2.5.0 jython2.5.0-standalone/
> Only in jython2.5.0-standalone/: Lib
> diff -r jython2.5.0/META-INF/MANIFEST.MF jython2.5.0-standalone//META-INF/MANIFEST.MF
> 2a3
> > Built-By: frank
> 5d5
> < Built-By: frank
> 8,10d7
> < version: 2.5.0
> < svn-build: true
> < oracle: true
> 11a9
> > svn-build: true
> 13d10
> < jdk-target-version: 1.5
> 14a12,14
> > oracle: true
> > version: 2.5.0
> > jdk-target-version: 1.5
> {code}
> The essential difference is the missing {{Lib/}} directory in the non-standalone jar.
> {code}
> $ ls -l jython2.5.0-standalone/Lib
> total 5236
> -rw-r--r-- 1 mnoll mnoll  33417 2012-04-24 09:28 aifc.py
> -rw-r--r-- 1 mnoll mnoll   2620 2012-04-24 09:28 anydbm.py
> -rw-r--r-- 1 mnoll mnoll  11347 2012-04-24 09:28 ast.py
> -rw-r--r-- 1 mnoll mnoll  10764 2012-04-24 09:28 asynchat.py
> -rw-r--r-- 1 mnoll mnoll  17276 2012-04-24 09:28 asyncore.py
> -rw-r--r-- 1 mnoll mnoll   1631 2012-04-24 09:28 atexit.py
> -rw-r--r-- 1 mnoll mnoll  11296 2012-04-24 09:28 base64.py
> -rw-r--r-- 1 mnoll mnoll  21289 2012-04-24 09:28 BaseHTTPServer.py
> -rw-r--r-- 1 mnoll mnoll  20143 2012-04-24 09:28 bdb.py
> [...snip...]
> {code}
> Apparently Jython (and thereby Pig) requires these Python module filesto be included in the {{jython.jar}} file -- at least in cluster environments where TaskTrackers DO NOT have Pig or Jython installed.
> *How to fix*
> In the Pig release package replace the {{jython.jar}} in {{lib/}} with a standalone version of the same jar.
> Here's how I creatd the standalone version of Jython 2.5.0 on my box:
> {code}
> $ java -jar jython_installer-2.5.0.jar -s -d /tmp/jython-install -t standalone -j $JAVA_HOME
> {code}
> This will create the standalone jar in {{/tmp/jython-install/jython.jar}}.  Place this file into {{$PIG_HOME/lib/}}, thereby overwriting the existing (non-standalone) version.  After that the Python test script above will work successfully.
> For completeness I also want to mention that I observed the following WARN messages before and after the Pig job was actually executed in the cluster:
> {code}
> $ /path/to/pig-0.10.0-RC1/bin/pig rctest.py 
> [...snipp...]
> # before job submission
> #
> 2012-04-24 14:16:58,463 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - jython cachedir skipped, jython may not work
> 2012-04-24 14:16:58,467 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,467 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: os.path, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,467 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: posixpath, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,468 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> # after the job finished (and succeeded)
> #
> 2012-04-24 14:16:58,548 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: os, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/os.py
> 2012-04-24 14:16:58,548 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: os.path, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: posixpath, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/posixpath.py
> 2012-04-24 14:16:58,548 [main] WARN  org.apache.pig.scripting.jython.JythonScriptEngine - module file does not exist: stat, /path/to/pig-0.10.0-RC1/lib/jython-2.5.0-standalone.jar/Lib/stat.py
> {code}
> *Jython 2.5.0 vs. Jython 2.5.2*
> FWIW I also tested whether switching to Jython 2.5.2 (up from 2.5.0 as bundled with the Pig 0.10 RC package) changes the results.  It did not.  That is, the Python script fails with non-standalone 2.5.2 jar but works with the standalone 2.5.2 jar.
> Best,
> Michael
> PS: Is there a reason Jython version 2.5.0 is bundled instead of the latest stable release 2.5.2?
> PPS: The 0.10.0-RC did solve my original PIG-1824 problem.  I could run the problematic Python/Pig script successfully using the 0.10.0-RC with a standalone Jython 2.5.0 jar. Cool!
> [1] http://people.apache.org/~daijy/pig-0.10.0-candidate-0/pig-0.10.0.tar.gz

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira