You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Chun Yang <cy...@contractor.salesforce.com> on 2012/06/16 03:30:19 UTC

Importing python modules in embedded pig

Hi all,

I'm trying to run the mahout canopy clustering algorithm through a
Python-embedded Pig script. The embedded Pig part of the script works (using
compileFromFile, bind, runSingle), but I can't figure out how to run mahout
from the same script. Originally I tried running mahout via subprocess.call,
but when trying to import subprocess, I get:

ImportError: No module named subprocess

Similar errors occur when I try to import sys or os modules.

Next I tried just instantiating the CanopyClustering class, but got a
similar error when using the following import statement:

from org.apache.mahout.clustering.canopy import CanopyDriver

#=> ImportError: No module named mahout

The ImportErrors don't occur when I run Python interactively. Is this a
Jython problem? Am I not setting some path properly?

Other possibly useful info:
- I'm including the mahout jars in the pig.additional.jars property.
- I'm running the script using Pig, i.e., `pig myscript.py`

Thanks,
Chun


Re: Importing python modules in embedded pig

Posted by Chun Yang <cy...@contractor.salesforce.com>.
Thanks Daniel!

That was exactly what I was looking for.

Cheers,
Chun


On 6/17/12 7:07 PM, "Daniel Dai" <da...@hortonworks.com> wrote:

> I see subprocess problem before. This is because we bundle jython.jar
> instead of jython-standalone.jar, see PIG-2665.
> 
> On Fri, Jun 15, 2012 at 6:30 PM, Chun Yang
> <cy...@contractor.salesforce.com>wrote:
> 
>> Hi all,
>> 
>> I'm trying to run the mahout canopy clustering algorithm through a
>> Python-embedded Pig script. The embedded Pig part of the script works
>> (using
>> compileFromFile, bind, runSingle), but I can't figure out how to run mahout
>> from the same script. Originally I tried running mahout via
>> subprocess.call,
>> but when trying to import subprocess, I get:
>> 
>> ImportError: No module named subprocess
>> 
>> Similar errors occur when I try to import sys or os modules.
>> 
>> Next I tried just instantiating the CanopyClustering class, but got a
>> similar error when using the following import statement:
>> 
>> from org.apache.mahout.clustering.canopy import CanopyDriver
>> 
>> #=> ImportError: No module named mahout
>> 
>> The ImportErrors don't occur when I run Python interactively. Is this a
>> Jython problem? Am I not setting some path properly?
>> 
>> Other possibly useful info:
>> - I'm including the mahout jars in the pig.additional.jars property.
>> - I'm running the script using Pig, i.e., `pig myscript.py`
>> 
>> Thanks,
>> Chun
>> 
>> 


Re: Importing python modules in embedded pig

Posted by Daniel Dai <da...@hortonworks.com>.
I see subprocess problem before. This is because we bundle jython.jar
instead of jython-standalone.jar, see PIG-2665.

On Fri, Jun 15, 2012 at 6:30 PM, Chun Yang
<cy...@contractor.salesforce.com>wrote:

> Hi all,
>
> I'm trying to run the mahout canopy clustering algorithm through a
> Python-embedded Pig script. The embedded Pig part of the script works
> (using
> compileFromFile, bind, runSingle), but I can't figure out how to run mahout
> from the same script. Originally I tried running mahout via
> subprocess.call,
> but when trying to import subprocess, I get:
>
> ImportError: No module named subprocess
>
> Similar errors occur when I try to import sys or os modules.
>
> Next I tried just instantiating the CanopyClustering class, but got a
> similar error when using the following import statement:
>
> from org.apache.mahout.clustering.canopy import CanopyDriver
>
> #=> ImportError: No module named mahout
>
> The ImportErrors don't occur when I run Python interactively. Is this a
> Jython problem? Am I not setting some path properly?
>
> Other possibly useful info:
> - I'm including the mahout jars in the pig.additional.jars property.
> - I'm running the script using Pig, i.e., `pig myscript.py`
>
> Thanks,
> Chun
>
>