You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pig.apache.org by "Ido Hadanny (JIRA)" <ji...@apache.org> on 2015/06/07 16:10:00 UTC

[jira] [Commented] (PIG-4124) Command for Python streaming udf should be configurable

    [ https://issues.apache.org/jira/browse/PIG-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14576246#comment-14576246 ] 

Ido Hadanny commented on PIG-4124:
----------------------------------

[~msukmanowsky], [~cheolsoo] - sorry for reviving this after such a long time, but I'm a big fan of installing python modules on virtualenvs and the shipping them and using them in a python udf. This is the way I'm currently doing it: https://ihadanny.wordpress.com/2014/12/01/python-virtualenv-with-pig-streaming/ . Can you recommend of an easier way to do it? can you share what you guys are using? Or do you prefer I'll open a stack-overflow question about this? 

Thanks!

> Command for Python streaming udf should be configurable
> -------------------------------------------------------
>
>                 Key: PIG-4124
>                 URL: https://issues.apache.org/jira/browse/PIG-4124
>             Project: Pig
>          Issue Type: Improvement
>            Reporter: Cheolsoo Park
>            Assignee: Cheolsoo Park
>             Fix For: 0.14.0
>
>         Attachments: PIG-4124-1.patch, PIG-4124-2.patch
>
>
> In my cluster, multiple versions of python are installed such as python2.6, python2.7, etc. Since some modules are only available on non-default python versions, it would be nice if the python command could be configurable by the user.
> For eg, I have a streaming udf that imports pytz. It fails with the following error if it runs with {{python}}-
> {code}
> : Caused by: org.apache.pig.impl.streaming.StreamingUDFException: LINE 4: ImportError: No module named pytz
> : File /mnt1/var/lib/hadoop/nm-local-dir/usercache/cheolsoop/appcache/application_1407968511815_0021/container_1407968511815_0021_01_001322/tmp/udfs.py, line 4, in <module>
> : import pytz
> : at org.apache.pig.impl.builtin.StreamingUDF$ProcessErrorThread.run(StreamingUDF.java:519)
> {code}
> But it works if I use {{python2.7}} as command.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)