You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-dev@hadoop.apache.org by springring <sp...@126.com> on 2013/01/12 09:30:57 UTC

python streaming error

Hi,

     When I run code below as a streaming, the job error N/A and killed.  I run step by step, find it error when
" file_obj = open(file) " .  When I run same code outside of hadoop, everything is ok.

  1 #!/bin/env python
  2
  3 import sys
  4
  5 for line in sys.stdin:
  6     offset,filename = line.split("\t")
  7     file = "hdfs://user/hdfs/catalog3/" + filename
  8     print line
  9     print filename
 10     print file
 11     file_obj = open(file)
..................................

Re:Re:Re: Re: python streaming error

Posted by springring <sp...@126.com>.

sorry
the error keep on, even when i modify the code

"offset,filename = line.strip().split("\t")"








At 2013-01-14 09:27:10,springring <sp...@126.com> wrote:
>hi,
>     I find the key point, not the hostname, it is right.
>just chang "offset,filename = line.split("\t")" to
>"offset,filename = line.strip().split("\t")"
>now it pass
>
>
>
>
>
>
>
>At 2013-01-12 16:58:29,"Nitin Pawar" <ni...@gmail.com> wrote:
>>computedb-13 is not a valid host name
>>
>>may be if you have local hadoop then you can name refer it with
>>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
>>
>>if its on other machine then just try with IP address of that machine
>>
>>
>>On Sat, Jan 12, 2013 at 12:55 AM, springring <sp...@126.com> wrote:
>>
>>> hi,
>>>
>>>     I modify the file as below, there is still error
>>>
>>>   1 #!/bin/env python
>>>   2
>>>   3 import sys
>>>   4
>>>   5 for line in sys.stdin:
>>>   6     offset,filename = line.split("\t")
>>>   7     file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
>>>   8     print line
>>>   9     print filename
>>>  10     print file
>>>  11     file_obj = open(file)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
>>> >is this correct path for writing onto hdfs?
>>> >
>>> >"hdfs://user/hdfs/catalog3."
>>> >
>>> >I don't see the namenode info in the path. Can this cause any issue. Just
>>> >making an guess
>>> >something like hdfs://host:port/path
>>> >
>>> >On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>>> >
>>> >> hdfs://user/hdfs/catalog3/
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >--
>>> >Nitin Pawar
>>>
>>
>>
>>
>>-- 
>>Nitin Pawar

Re:Re: Re: python streaming error

Posted by springring <sp...@126.com>.

hi,
     I find the key point, not the hostname, it is right.
just chang "offset,filename = line.split("\t")" to
"offset,filename = line.strip().split("\t")"
now it pass







At 2013-01-12 16:58:29,"Nitin Pawar" <ni...@gmail.com> wrote:
>computedb-13 is not a valid host name
>
>may be if you have local hadoop then you can name refer it with
>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
>
>if its on other machine then just try with IP address of that machine
>
>
>On Sat, Jan 12, 2013 at 12:55 AM, springring <sp...@126.com> wrote:
>
>> hi,
>>
>>     I modify the file as below, there is still error
>>
>>   1 #!/bin/env python
>>   2
>>   3 import sys
>>   4
>>   5 for line in sys.stdin:
>>   6     offset,filename = line.split("\t")
>>   7     file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
>>   8     print line
>>   9     print filename
>>  10     print file
>>  11     file_obj = open(file)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
>> >is this correct path for writing onto hdfs?
>> >
>> >"hdfs://user/hdfs/catalog3."
>> >
>> >I don't see the namenode info in the path. Can this cause any issue. Just
>> >making an guess
>> >something like hdfs://host:port/path
>> >
>> >On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>> >
>> >> hdfs://user/hdfs/catalog3/
>> >
>> >
>> >
>> >
>> >
>> >--
>> >Nitin Pawar
>>
>
>
>
>-- 
>Nitin Pawar

Re: Re: python streaming error

Posted by Nitin Pawar <ni...@gmail.com>.

computedb-13 is not a valid host name

may be if you have local hadoop then you can name refer it with
hdfs://localhost:9100/ or hdfs://127.0.0.1:9100

if its on other machine then just try with IP address of that machine


On Sat, Jan 12, 2013 at 12:55 AM, springring <sp...@126.com> wrote:

> hi,
>
>     I modify the file as below, there is still error
>
>   1 #!/bin/env python
>   2
>   3 import sys
>   4
>   5 for line in sys.stdin:
>   6     offset,filename = line.split("\t")
>   7     file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
>   8     print line
>   9     print filename
>  10     print file
>  11     file_obj = open(file)
>
>
>
>
>
>
>
>
>
> At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
> >is this correct path for writing onto hdfs?
> >
> >"hdfs://user/hdfs/catalog3."
> >
> >I don't see the namenode info in the path. Can this cause any issue. Just
> >making an guess
> >something like hdfs://host:port/path
> >
> >On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
> >
> >> hdfs://user/hdfs/catalog3/
> >
> >
> >
> >
> >
> >--
> >Nitin Pawar
>



-- 
Nitin Pawar

Re:Re: python streaming error

Posted by springring <sp...@126.com>.

hi,

    I modify the file as below, there is still error

  1 #!/bin/env python
  2
  3 import sys
  4
  5 for line in sys.stdin:
  6     offset,filename = line.split("\t")
  7     file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
  8     print line
  9     print filename
 10     print file
 11     file_obj = open(file)









At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
>is this correct path for writing onto hdfs?
>
>"hdfs://user/hdfs/catalog3."
>
>I don't see the namenode info in the path. Can this cause any issue. Just
>making an guess
>something like hdfs://host:port/path
>
>On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>
>> hdfs://user/hdfs/catalog3/
>
>
>
>
>
>-- 
>Nitin Pawar

Re: python streaming error

Posted by Nitin Pawar <ni...@gmail.com>.

is this correct path for writing onto hdfs?

"hdfs://user/hdfs/catalog3."

I don't see the namenode info in the path. Can this cause any issue. Just
making an guess
something like hdfs://host:port/path

On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:

> hdfs://user/hdfs/catalog3/





-- 
Nitin Pawar

Re: python streaming error

Posted by Simone Leo <si...@crs4.it>.

Hello,

you can use the Pydoop HDFS API to work with HDFS files:

 >>> import pydoop.hdfs as hdfs
 >>> with hdfs.open('hdfs://localhost:8020/user/myuser/filename') as f:
...     for line in f:
...             do_something(line)

As you can see, the API is very similar to that of ordinary Python file 
objects.  Check out the following tutorial for more details:

http://pydoop.sourceforge.net/docs/tutorial/hdfs_api.html

Note that Pydoop also has a MapReduce API, so you can use it to rewrite 
the whole program:

http://pydoop.sourceforge.net/docs/tutorial/mapred_api.html

It also has a more compact and easy-to-use scripting engine for simple 
applications:

http://pydoop.sourceforge.net/docs/tutorial/pydoop_script.html

If you think Pydoop is right for you, read the installation guide:

http://pydoop.sourceforge.net/docs/installation.html

Simone

On 01/14/2013 11:24 PM, Andy Isaacson wrote:
> Oh, another link I should have included!
> http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
>
> -andy
>
> On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <ad...@cloudera.com> wrote:
>> Hadoop Streaming does not magically teach Python open() how to read
>> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
>> -cat" to read the file for you.
>>
>> A few links that may help:
>>
>> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
>> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
>> https://bitbucket.org/turnaev/cyhdfs
>>
>> -andy
>>
>> On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>>> Hi,
>>>
>>>       When I run code below as a streaming, the job error N/A and killed.  I run step by step, find it error when
>>> " file_obj = open(file) " .  When I run same code outside of hadoop, everything is ok.
>>>
>>>    1 #!/bin/env python
>>>    2
>>>    3 import sys
>>>    4
>>>    5 for line in sys.stdin:
>>>    6     offset,filename = line.split("\t")
>>>    7     file = "hdfs://user/hdfs/catalog3/" + filename
>>>    8     print line
>>>    9     print filename
>>>   10     print file
>>>   11     file_obj = open(file)
>>> ..................................
>>>

-- 
Simone Leo
Data Fusion - Distributed Computing
CRS4
POLARIS - Building #1
Piscina Manna
I-09010 Pula (CA) - Italy
e-mail: simone.leo@crs4.it
http://www.crs4.it

Re: python streaming error

Posted by Andy Isaacson <ad...@cloudera.com>.

Oh, another link I should have included!
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/

-andy

On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <ad...@cloudera.com> wrote:
> Hadoop Streaming does not magically teach Python open() how to read
> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
> -cat" to read the file for you.
>
> A few links that may help:
>
> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
> https://bitbucket.org/turnaev/cyhdfs
>
> -andy
>
> On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>> Hi,
>>
>>      When I run code below as a streaming, the job error N/A and killed.  I run step by step, find it error when
>> " file_obj = open(file) " .  When I run same code outside of hadoop, everything is ok.
>>
>>   1 #!/bin/env python
>>   2
>>   3 import sys
>>   4
>>   5 for line in sys.stdin:
>>   6     offset,filename = line.split("\t")
>>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>>   8     print line
>>   9     print filename
>>  10     print file
>>  11     file_obj = open(file)
>> ..................................
>>

Re: python streaming error

Posted by Andy Isaacson <ad...@cloudera.com>.

Hadoop Streaming does not magically teach Python open() how to read
from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
-cat" to read the file for you.

A few links that may help:

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
https://bitbucket.org/turnaev/cyhdfs

-andy

On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
> Hi,
>
>      When I run code below as a streaming, the job error N/A and killed.  I run step by step, find it error when
> " file_obj = open(file) " .  When I run same code outside of hadoop, everything is ok.
>
>   1 #!/bin/env python
>   2
>   3 import sys
>   4
>   5 for line in sys.stdin:
>   6     offset,filename = line.split("\t")
>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>   8     print line
>   9     print filename
>  10     print file
>  11     file_obj = open(file)
> ..................................
>