You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by Haider <ha...@gmail.com> on 2013/12/02 14:22:20 UTC

listdir() python function is not wokring on hadoop

Hi all

   is there any one who successfully used listdir() function to retrieve
files one by one from HDFS using python script.


 if __name__ == '__main__':

    for filename in os.listdir("/user/hdmaster/XML2"):
    print filename

ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201312020139_0025_m_000000
13/12/02 05:20:50 INFO streaming.StreamJob: killJob...

My intention is to take files one by one to parse.

Any help or suggestion on this will be so much helpful to me

Thanks
Haider

Re: listdir() python function is not wokring on hadoop

Posted by Haider <ha...@gmail.com>.
python setup.py build-->giving error
Packaging Java classes sh: 1: jar: not found error: Error packaging java
component. Command: jar -cf
build/lib.linux-i686-2.7/pydoop/pydoop_1_1_2.jar -C
build/temp.linux-i686-2.7/pipes-1.1.2 ./it



On Sat, Dec 7, 2013 at 12:00 PM, Nitin Pawar <ni...@gmail.com>wrote:

> Can you share the error?
> On Dec 7, 2013 8:49 AM, "Haider" <ha...@gmail.com> wrote:
>
> > Hi All
> >
> >     Thanks for you suggestions
> > But in my case I have thousands small files and I want read them one by
> > one.I think it is only possible by using listdir().
> > As per Nitin comment I tried to install Pydoop but it is throwing me some
> > strange error and I am not finding any inforamtion on pydoop on google.
> >
> > thanks
> > Haider
> >
> >
> >
> >
> > On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
> > <ne...@intel.com>wrote:
> >
> > > Haider,
> > > You can use TextLoader to read a file in HDFS line by line, and then
> you
> > > can pass those lines to your python UDF. Something like the following
> > > should work:
> > >
> > > x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> > > y = foreach x generate my_udf(line);
> > >
> > > -----Original Message-----
> > > From: Haider [mailto:haider.nitc@gmail.com]
> > > Sent: Thursday, December 5, 2013 10:12 PM
> > > To: user@pig.apache.org
> > > Subject: Re: listdir() python function is not wokring on hadoop
> > >
> > > I am trying to read from HDFS not from Local file system, so would it
> be
> > > possible through listdir? or is there any way to read hdfs files one by
> > one
> > > and passing to one funtion.
> > >
> > >
> > >
> > >
> > > On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> > > <ne...@intel.com>wrote:
> > >
> > > > I can call listdir to read from local filesystem in a python UDF. Did
> > > > you implement your function as a proper UDF?
> > > > ________________________________________
> > > > From: Haider [haider.nitc@gmail.com]
> > > > Sent: Monday, December 02, 2013 5:22 AM
> > > > To: user@pig.apache.org
> > > > Subject: listdir() python function is not wokring on hadoop
> > > >
> > > > Hi all
> > > >
> > > >    is there any one who successfully used listdir() function to
> > > > retrieve files one by one from HDFS using python script.
> > > >
> > > >
> > > >  if __name__ == '__main__':
> > > >
> > > >     for filename in os.listdir("/user/hdmaster/XML2"):
> > > >     print filename
> > > >
> > > > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > > > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > > > task_201312020139_0025_m_000000
> > > > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> > > >
> > > > My intention is to take files one by one to parse.
> > > >
> > > > Any help or suggestion on this will be so much helpful to me
> > > >
> > > > Thanks
> > > > Haider
> > > >
> > >
> >
>

Re: listdir() python function is not wokring on hadoop

Posted by Nitin Pawar <ni...@gmail.com>.
Can you share the error?
On Dec 7, 2013 8:49 AM, "Haider" <ha...@gmail.com> wrote:

> Hi All
>
>     Thanks for you suggestions
> But in my case I have thousands small files and I want read them one by
> one.I think it is only possible by using listdir().
> As per Nitin comment I tried to install Pydoop but it is throwing me some
> strange error and I am not finding any inforamtion on pydoop on google.
>
> thanks
> Haider
>
>
>
>
> On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
> <ne...@intel.com>wrote:
>
> > Haider,
> > You can use TextLoader to read a file in HDFS line by line, and then you
> > can pass those lines to your python UDF. Something like the following
> > should work:
> >
> > x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> > y = foreach x generate my_udf(line);
> >
> > -----Original Message-----
> > From: Haider [mailto:haider.nitc@gmail.com]
> > Sent: Thursday, December 5, 2013 10:12 PM
> > To: user@pig.apache.org
> > Subject: Re: listdir() python function is not wokring on hadoop
> >
> > I am trying to read from HDFS not from Local file system, so would it be
> > possible through listdir? or is there any way to read hdfs files one by
> one
> > and passing to one funtion.
> >
> >
> >
> >
> > On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> > <ne...@intel.com>wrote:
> >
> > > I can call listdir to read from local filesystem in a python UDF. Did
> > > you implement your function as a proper UDF?
> > > ________________________________________
> > > From: Haider [haider.nitc@gmail.com]
> > > Sent: Monday, December 02, 2013 5:22 AM
> > > To: user@pig.apache.org
> > > Subject: listdir() python function is not wokring on hadoop
> > >
> > > Hi all
> > >
> > >    is there any one who successfully used listdir() function to
> > > retrieve files one by one from HDFS using python script.
> > >
> > >
> > >  if __name__ == '__main__':
> > >
> > >     for filename in os.listdir("/user/hdmaster/XML2"):
> > >     print filename
> > >
> > > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > > task_201312020139_0025_m_000000
> > > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> > >
> > > My intention is to take files one by one to parse.
> > >
> > > Any help or suggestion on this will be so much helpful to me
> > >
> > > Thanks
> > > Haider
> > >
> >
>

Re: listdir() python function is not wokring on hadoop

Posted by Haider <ha...@gmail.com>.
Hi All

    Thanks for you suggestions
But in my case I have thousands small files and I want read them one by
one.I think it is only possible by using listdir().
As per Nitin comment I tried to install Pydoop but it is throwing me some
strange error and I am not finding any inforamtion on pydoop on google.

thanks
Haider




On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
<ne...@intel.com>wrote:

> Haider,
> You can use TextLoader to read a file in HDFS line by line, and then you
> can pass those lines to your python UDF. Something like the following
> should work:
>
> x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> y = foreach x generate my_udf(line);
>
> -----Original Message-----
> From: Haider [mailto:haider.nitc@gmail.com]
> Sent: Thursday, December 5, 2013 10:12 PM
> To: user@pig.apache.org
> Subject: Re: listdir() python function is not wokring on hadoop
>
> I am trying to read from HDFS not from Local file system, so would it be
> possible through listdir? or is there any way to read hdfs files one by one
> and passing to one funtion.
>
>
>
>
> On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> <ne...@intel.com>wrote:
>
> > I can call listdir to read from local filesystem in a python UDF. Did
> > you implement your function as a proper UDF?
> > ________________________________________
> > From: Haider [haider.nitc@gmail.com]
> > Sent: Monday, December 02, 2013 5:22 AM
> > To: user@pig.apache.org
> > Subject: listdir() python function is not wokring on hadoop
> >
> > Hi all
> >
> >    is there any one who successfully used listdir() function to
> > retrieve files one by one from HDFS using python script.
> >
> >
> >  if __name__ == '__main__':
> >
> >     for filename in os.listdir("/user/hdmaster/XML2"):
> >     print filename
> >
> > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > task_201312020139_0025_m_000000
> > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> >
> > My intention is to take files one by one to parse.
> >
> > Any help or suggestion on this will be so much helpful to me
> >
> > Thanks
> > Haider
> >
>

RE: listdir() python function is not wokring on hadoop

Posted by "Yigitbasi, Nezih" <ne...@intel.com>.
Haider,
You can use TextLoader to read a file in HDFS line by line, and then you can pass those lines to your python UDF. Something like the following should work:

x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
y = foreach x generate my_udf(line);

-----Original Message-----
From: Haider [mailto:haider.nitc@gmail.com] 
Sent: Thursday, December 5, 2013 10:12 PM
To: user@pig.apache.org
Subject: Re: listdir() python function is not wokring on hadoop

I am trying to read from HDFS not from Local file system, so would it be possible through listdir? or is there any way to read hdfs files one by one and passing to one funtion.




On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
<ne...@intel.com>wrote:

> I can call listdir to read from local filesystem in a python UDF. Did 
> you implement your function as a proper UDF?
> ________________________________________
> From: Haider [haider.nitc@gmail.com]
> Sent: Monday, December 02, 2013 5:22 AM
> To: user@pig.apache.org
> Subject: listdir() python function is not wokring on hadoop
>
> Hi all
>
>    is there any one who successfully used listdir() function to 
> retrieve files one by one from HDFS using python script.
>
>
>  if __name__ == '__main__':
>
>     for filename in os.listdir("/user/hdmaster/XML2"):
>     print filename
>
> ERROR streaming.StreamJob: Job not successful. Error: # of failed Map 
> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> task_201312020139_0025_m_000000
> 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
>
> My intention is to take files one by one to parse.
>
> Any help or suggestion on this will be so much helpful to me
>
> Thanks
> Haider
>

Re: listdir() python function is not wokring on hadoop

Posted by Nitin Pawar <ni...@gmail.com>.
Haidar, you can not use python system level functions on hadoop directly.
You may want to take a look at PyDoop project if you want those features


On Fri, Dec 6, 2013 at 2:22 PM, shashwat shriparv <dwivedishashwat@gmail.com
> wrote:

> I am not sure that this function can list out hdfs dir and files.
>
>
> On Fri, Dec 6, 2013 at 11:42 AM, Haider <ha...@gmail.com> wrote:
>
> > listdir
>
>
>
>
> *Thanks & Regards    *
>
> ∞
> Shashwat Shriparv
>



-- 
Nitin Pawar

Re: listdir() python function is not wokring on hadoop

Posted by shashwat shriparv <dw...@gmail.com>.
I am not sure that this function can list out hdfs dir and files.


On Fri, Dec 6, 2013 at 11:42 AM, Haider <ha...@gmail.com> wrote:

> listdir




*Thanks & Regards    *

∞
Shashwat Shriparv

Re: listdir() python function is not wokring on hadoop

Posted by Haider <ha...@gmail.com>.
I am trying to read from HDFS not from Local file system, so would it be
possible through listdir? or is there any way to read hdfs files one by one
and passing to one funtion.




On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
<ne...@intel.com>wrote:

> I can call listdir to read from local filesystem in a python UDF. Did you
> implement your function as a proper UDF?
> ________________________________________
> From: Haider [haider.nitc@gmail.com]
> Sent: Monday, December 02, 2013 5:22 AM
> To: user@pig.apache.org
> Subject: listdir() python function is not wokring on hadoop
>
> Hi all
>
>    is there any one who successfully used listdir() function to retrieve
> files one by one from HDFS using python script.
>
>
>  if __name__ == '__main__':
>
>     for filename in os.listdir("/user/hdmaster/XML2"):
>     print filename
>
> ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks
> exceeded allowed limit. FailedCount: 1. LastFailedTask:
> task_201312020139_0025_m_000000
> 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
>
> My intention is to take files one by one to parse.
>
> Any help or suggestion on this will be so much helpful to me
>
> Thanks
> Haider
>

RE: listdir() python function is not wokring on hadoop

Posted by "Yigitbasi, Nezih" <ne...@intel.com>.
I can call listdir to read from local filesystem in a python UDF. Did you implement your function as a proper UDF?
________________________________________
From: Haider [haider.nitc@gmail.com]
Sent: Monday, December 02, 2013 5:22 AM
To: user@pig.apache.org
Subject: listdir() python function is not wokring on hadoop

Hi all

   is there any one who successfully used listdir() function to retrieve
files one by one from HDFS using python script.


 if __name__ == '__main__':

    for filename in os.listdir("/user/hdmaster/XML2"):
    print filename

ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks
exceeded allowed limit. FailedCount: 1. LastFailedTask:
task_201312020139_0025_m_000000
13/12/02 05:20:50 INFO streaming.StreamJob: killJob...

My intention is to take files one by one to parse.

Any help or suggestion on this will be so much helpful to me

Thanks
Haider