You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by jamal sasha <ja...@gmail.com> on 2013/08/16 01:23:52 UTC

executing linux command from hadoop (python)

Hi,
 Lets say that I have a data which interacts with a rest api like

%curl hostname data

Now, I have the following script:

#!/usr/bin/env python
import sys,os

cmd = """curl http://localhost  --data  '"""
string = " "
for line in sys.stdin:
    line = line.rstrip(os.linesep)
    string += line

os.system(cmd + string+"'")


Now, if i give a sample file for data, and run the above script with

cat data.txt | python mapper.py

It works perfectly. But will this work if i execute on hadoop as well?
I am trying to set up hadoop on local mode to check it out but I think it
will take me sometime to get there?
Any experiences, suggestions?
Thanks

Re: executing linux command from hadoop (python)

Posted by Harsh J <ha...@cloudera.com>.
Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.

P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.

On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>  Lets say that I have a data which interacts with a rest api like
>
> %curl hostname data
>
> Now, I have the following script:
>
> #!/usr/bin/env python
> import sys,os
>
> cmd = """curl http://localhost  --data  '"""
> string = " "
> for line in sys.stdin:
>     line = line.rstrip(os.linesep)
>     string += line
>
> os.system(cmd + string+"'")
>
>
> Now, if i give a sample file for data, and run the above script with
>
> cat data.txt | python mapper.py
>
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?
> Thanks
>



-- 
Harsh J

Re: executing linux command from hadoop (python)

Posted by Harsh J <ha...@cloudera.com>.
Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.

P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.

On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>  Lets say that I have a data which interacts with a rest api like
>
> %curl hostname data
>
> Now, I have the following script:
>
> #!/usr/bin/env python
> import sys,os
>
> cmd = """curl http://localhost  --data  '"""
> string = " "
> for line in sys.stdin:
>     line = line.rstrip(os.linesep)
>     string += line
>
> os.system(cmd + string+"'")
>
>
> Now, if i give a sample file for data, and run the above script with
>
> cat data.txt | python mapper.py
>
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?
> Thanks
>



-- 
Harsh J

Re: executing linux command from hadoop (python)

Posted by Harsh J <ha...@cloudera.com>.
Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.

P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.

On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>  Lets say that I have a data which interacts with a rest api like
>
> %curl hostname data
>
> Now, I have the following script:
>
> #!/usr/bin/env python
> import sys,os
>
> cmd = """curl http://localhost  --data  '"""
> string = " "
> for line in sys.stdin:
>     line = line.rstrip(os.linesep)
>     string += line
>
> os.system(cmd + string+"'")
>
>
> Now, if i give a sample file for data, and run the above script with
>
> cat data.txt | python mapper.py
>
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?
> Thanks
>



-- 
Harsh J

Re: executing linux command from hadoop (python)

Posted by Harsh J <ha...@cloudera.com>.
Yes it would work with streaming, but note that if your os.system(…)
call produces any stdout prints, they are treated as task output and
are sent to HDFS/Reducers.

P.s. I assume the example you've produced is naive but if it is not,
re-consider appending all those strings together. You don't want to be
holding so much data in memory when run over large files, and nor
would a command support lengths as long as, say, a 64 MB input block.

On Fri, Aug 16, 2013 at 4:53 AM, jamal sasha <ja...@gmail.com> wrote:
> Hi,
>  Lets say that I have a data which interacts with a rest api like
>
> %curl hostname data
>
> Now, I have the following script:
>
> #!/usr/bin/env python
> import sys,os
>
> cmd = """curl http://localhost  --data  '"""
> string = " "
> for line in sys.stdin:
>     line = line.rstrip(os.linesep)
>     string += line
>
> os.system(cmd + string+"'")
>
>
> Now, if i give a sample file for data, and run the above script with
>
> cat data.txt | python mapper.py
>
> It works perfectly. But will this work if i execute on hadoop as well?
> I am trying to set up hadoop on local mode to check it out but I think it
> will take me sometime to get there?
> Any experiences, suggestions?
> Thanks
>



-- 
Harsh J