You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Jeremy Lewi (Commented) (JIRA)" <ji...@apache.org> on 2011/12/19 00:26:31 UTC

[jira] [Commented] (AVRO-570) python implementation of mapreduce connector

    [ https://issues.apache.org/jira/browse/AVRO-570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13171960#comment-13171960 ] 

Jeremy Lewi commented on AVRO-570:
----------------------------------

Doug,

Sorry for not responding sooner (gmail failed to flag the email as important and it got lost in the deluge).

It looks like the python path isn't getting set correctly in the tethered task and it can't find the tether module. My guess is that you have previously installed the python avro module and this is what is getting found when you do 
{noformat}
from avro import tether
{noformat}
i.e something similar to https://issues.apache.org/jira/browse/AVRO-849).

Can you try removing any previously installed avro modules? e.g
{noformat}
rm -rf /usr/lib/python2.7/site-packages/avro-*
{noformat}
of course you'll want to substitute the correct python path.

One other thing to look at is the output of the python tests. Look for a line like the following
{noformat}
 [py-test] Command:
 [py-test] 	hadoop-0.20 jar /home/jlewi/svn_avro/lang/java/tools/target/avro-tools-1.6.1-job.jar tether --in /tmp/mapred/in --out /tmp/mapred/out --outschema /tmp/wordcounterN9SD.avsc --protocol http --program /tmp/exec_word_count_9vUzRg
{noformat}

The file 
{noformat}
/tmp/exec_word_count_??????
{noformat}
is a temporary file that gets written on each invocation of the tests so the suffix will change on each invocation. This file is a bash script that gets executed to start the tethered process.

The contents of the file should be something like
{noformat}
#!/bin/bash
export PYTHONPATH=/home/jlewi/svn_avro/lang/py/build/src:/home/jlewi/svn_avro/lang/py/build/test
python -m avro.tether.tether_task_runner word_count_task.WordCountTask
{noformat}

Can you take a look at your file and verify that the python path is set correctly in your case, i.e you could try executing the following in a shell
{noformat}
export PYTHONPATH=/home/jlewi/svn_avro/lang/py/build/src:/home/jlewi/svn_avro/lang/py/build/test
{noformat}
then start an interactive python session and executing
{noformat}
from avro import tether
{noformat}
If you get an exception then there's a problem importing avro because 1) either the path isn't set correctly or 2) there's an older version of avro with higher precedence on the path. If its the latter you can try the following commands to identify which avro its picking up
{noformat}
import avro
avro.__file__
{noformat}


J

                
> python implementation of mapreduce connector
> --------------------------------------------
>
>                 Key: AVRO-570
>                 URL: https://issues.apache.org/jira/browse/AVRO-570
>             Project: Avro
>          Issue Type: New Feature
>          Components: python
>    Affects Versions: 1.6.0
>            Reporter: Doug Cutting
>            Assignee: Jeremy Lewi
>            Priority: Critical
>              Labels: hadoop
>             Fix For: 1.7.0
>
>         Attachments: AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch, AVRO-570.patch
>
>
> AVRO-512 defines protocols for implementing mapreduce tasks.  It would be good to have a Python implementation of this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira