You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by springring <sp...@126.com> on 2013/01/12 09:30:57 UTC
python streaming error
Hi,
When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when
" file_obj = open(file) " . When I run same code outside of hadoop, everything is ok.
1 #!/bin/env python
2
3 import sys
4
5 for line in sys.stdin:
6 offset,filename = line.split("\t")
7 file = "hdfs://user/hdfs/catalog3/" + filename
8 print line
9 print filename
10 print file
11 file_obj = open(file)
..................................
Re:Re:Re: Re: python streaming error
Posted by springring <sp...@126.com>.
sorry
the error keep on, even when i modify the code
"offset,filename = line.strip().split("\t")"
At 2013-01-14 09:27:10,springring <sp...@126.com> wrote:
>hi,
> I find the key point, not the hostname, it is right.
>just chang "offset,filename = line.split("\t")" to
>"offset,filename = line.strip().split("\t")"
>now it pass
>
>
>
>
>
>
>
>At 2013-01-12 16:58:29,"Nitin Pawar" <ni...@gmail.com> wrote:
>>computedb-13 is not a valid host name
>>
>>may be if you have local hadoop then you can name refer it with
>>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
>>
>>if its on other machine then just try with IP address of that machine
>>
>>
>>On Sat, Jan 12, 2013 at 12:55 AM, springring <sp...@126.com> wrote:
>>
>>> hi,
>>>
>>> I modify the file as below, there is still error
>>>
>>> 1 #!/bin/env python
>>> 2
>>> 3 import sys
>>> 4
>>> 5 for line in sys.stdin:
>>> 6 offset,filename = line.split("\t")
>>> 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
>>> 8 print line
>>> 9 print filename
>>> 10 print file
>>> 11 file_obj = open(file)
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
>>> >is this correct path for writing onto hdfs?
>>> >
>>> >"hdfs://user/hdfs/catalog3."
>>> >
>>> >I don't see the namenode info in the path. Can this cause any issue. Just
>>> >making an guess
>>> >something like hdfs://host:port/path
>>> >
>>> >On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>>> >
>>> >> hdfs://user/hdfs/catalog3/
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >--
>>> >Nitin Pawar
>>>
>>
>>
>>
>>--
>>Nitin Pawar
Re:Re: Re: python streaming error
Posted by springring <sp...@126.com>.
hi,
I find the key point, not the hostname, it is right.
just chang "offset,filename = line.split("\t")" to
"offset,filename = line.strip().split("\t")"
now it pass
At 2013-01-12 16:58:29,"Nitin Pawar" <ni...@gmail.com> wrote:
>computedb-13 is not a valid host name
>
>may be if you have local hadoop then you can name refer it with
>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
>
>if its on other machine then just try with IP address of that machine
>
>
>On Sat, Jan 12, 2013 at 12:55 AM, springring <sp...@126.com> wrote:
>
>> hi,
>>
>> I modify the file as below, there is still error
>>
>> 1 #!/bin/env python
>> 2
>> 3 import sys
>> 4
>> 5 for line in sys.stdin:
>> 6 offset,filename = line.split("\t")
>> 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
>> 8 print line
>> 9 print filename
>> 10 print file
>> 11 file_obj = open(file)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
>> >is this correct path for writing onto hdfs?
>> >
>> >"hdfs://user/hdfs/catalog3."
>> >
>> >I don't see the namenode info in the path. Can this cause any issue. Just
>> >making an guess
>> >something like hdfs://host:port/path
>> >
>> >On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>> >
>> >> hdfs://user/hdfs/catalog3/
>> >
>> >
>> >
>> >
>> >
>> >--
>> >Nitin Pawar
>>
>
>
>
>--
>Nitin Pawar
Re: Re: python streaming error
Posted by Nitin Pawar <ni...@gmail.com>.
computedb-13 is not a valid host name
may be if you have local hadoop then you can name refer it with
hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
if its on other machine then just try with IP address of that machine
On Sat, Jan 12, 2013 at 12:55 AM, springring <sp...@126.com> wrote:
> hi,
>
> I modify the file as below, there is still error
>
> 1 #!/bin/env python
> 2
> 3 import sys
> 4
> 5 for line in sys.stdin:
> 6 offset,filename = line.split("\t")
> 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
> 8 print line
> 9 print filename
> 10 print file
> 11 file_obj = open(file)
>
>
>
>
>
>
>
>
>
> At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
> >is this correct path for writing onto hdfs?
> >
> >"hdfs://user/hdfs/catalog3."
> >
> >I don't see the namenode info in the path. Can this cause any issue. Just
> >making an guess
> >something like hdfs://host:port/path
> >
> >On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
> >
> >> hdfs://user/hdfs/catalog3/
> >
> >
> >
> >
> >
> >--
> >Nitin Pawar
>
--
Nitin Pawar
Re:Re: python streaming error
Posted by springring <sp...@126.com>.
hi,
I modify the file as below, there is still error
1 #!/bin/env python
2
3 import sys
4
5 for line in sys.stdin:
6 offset,filename = line.split("\t")
7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename
8 print line
9 print filename
10 print file
11 file_obj = open(file)
At 2013-01-12 16:34:37,"Nitin Pawar" <ni...@gmail.com> wrote:
>is this correct path for writing onto hdfs?
>
>"hdfs://user/hdfs/catalog3."
>
>I don't see the namenode info in the path. Can this cause any issue. Just
>making an guess
>something like hdfs://host:port/path
>
>On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>
>> hdfs://user/hdfs/catalog3/
>
>
>
>
>
>--
>Nitin Pawar
Re: python streaming error
Posted by Nitin Pawar <ni...@gmail.com>.
is this correct path for writing onto hdfs?
"hdfs://user/hdfs/catalog3."
I don't see the namenode info in the path. Can this cause any issue. Just
making an guess
something like hdfs://host:port/path
On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
> hdfs://user/hdfs/catalog3/
--
Nitin Pawar
Re: python streaming error
Posted by Simone Leo <si...@crs4.it>.
Hello,
you can use the Pydoop HDFS API to work with HDFS files:
>>> import pydoop.hdfs as hdfs
>>> with hdfs.open('hdfs://localhost:8020/user/myuser/filename') as f:
... for line in f:
... do_something(line)
As you can see, the API is very similar to that of ordinary Python file
objects. Check out the following tutorial for more details:
http://pydoop.sourceforge.net/docs/tutorial/hdfs_api.html
Note that Pydoop also has a MapReduce API, so you can use it to rewrite
the whole program:
http://pydoop.sourceforge.net/docs/tutorial/mapred_api.html
It also has a more compact and easy-to-use scripting engine for simple
applications:
http://pydoop.sourceforge.net/docs/tutorial/pydoop_script.html
If you think Pydoop is right for you, read the installation guide:
http://pydoop.sourceforge.net/docs/installation.html
Simone
On 01/14/2013 11:24 PM, Andy Isaacson wrote:
> Oh, another link I should have included!
> http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
>
> -andy
>
> On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <ad...@cloudera.com> wrote:
>> Hadoop Streaming does not magically teach Python open() how to read
>> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
>> -cat" to read the file for you.
>>
>> A few links that may help:
>>
>> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
>> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
>> https://bitbucket.org/turnaev/cyhdfs
>>
>> -andy
>>
>> On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>>> Hi,
>>>
>>> When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when
>>> " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok.
>>>
>>> 1 #!/bin/env python
>>> 2
>>> 3 import sys
>>> 4
>>> 5 for line in sys.stdin:
>>> 6 offset,filename = line.split("\t")
>>> 7 file = "hdfs://user/hdfs/catalog3/" + filename
>>> 8 print line
>>> 9 print filename
>>> 10 print file
>>> 11 file_obj = open(file)
>>> ..................................
>>>
--
Simone Leo
Data Fusion - Distributed Computing
CRS4
POLARIS - Building #1
Piscina Manna
I-09010 Pula (CA) - Italy
e-mail: simone.leo@crs4.it
http://www.crs4.it
Re: python streaming error
Posted by Andy Isaacson <ad...@cloudera.com>.
Oh, another link I should have included!
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/
-andy
On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <ad...@cloudera.com> wrote:
> Hadoop Streaming does not magically teach Python open() how to read
> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
> -cat" to read the file for you.
>
> A few links that may help:
>
> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
> https://bitbucket.org/turnaev/cyhdfs
>
> -andy
>
> On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
>> Hi,
>>
>> When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when
>> " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok.
>>
>> 1 #!/bin/env python
>> 2
>> 3 import sys
>> 4
>> 5 for line in sys.stdin:
>> 6 offset,filename = line.split("\t")
>> 7 file = "hdfs://user/hdfs/catalog3/" + filename
>> 8 print line
>> 9 print filename
>> 10 print file
>> 11 file_obj = open(file)
>> ..................................
>>
Re: python streaming error
Posted by Andy Isaacson <ad...@cloudera.com>.
Hadoop Streaming does not magically teach Python open() how to read
from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
-cat" to read the file for you.
A few links that may help:
http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
https://bitbucket.org/turnaev/cyhdfs
-andy
On Sat, Jan 12, 2013 at 12:30 AM, springring <sp...@126.com> wrote:
> Hi,
>
> When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when
> " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok.
>
> 1 #!/bin/env python
> 2
> 3 import sys
> 4
> 5 for line in sys.stdin:
> 6 offset,filename = line.split("\t")
> 7 file = "hdfs://user/hdfs/catalog3/" + filename
> 8 print line
> 9 print filename
> 10 print file
> 11 file_obj = open(file)
> ..................................
>