You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Dingcheng Li <di...@gmail.com> on 2015/12/06 22:46:00 UTC

Help on perl streaming

Hi, folks,

I am using hadoop streaming to call perl scripts as mapper. Things are
working well. But I found that the resource file reading is a problem.

Basically I think that I am on the right track, -file option is the correct
way to get resource file read. I tested on python script. But for perl, it
always gives the file not found error. I noticed that in python “import
sys” is sued. I am not sure what is needed for perl. I have a simple test
code as follows (use Sys not working),


#!/usr/bin/perl

my $filter_file = "salesData/salesFilter.txt";

open(FH, $filter_file) or die "Could not open file '$filter_file' $!";

#my $filename = $0;

#open(my $fh, '<:encoding(UTF-8)', $filename)

 # or die "Could not open file '$filename' $!";


#my $filename = $ENV{"map_input_file"};

my $filename = $ENV{"mapreduce_map_input_file"};

#mapreduce_map_input_file

print STDERR "Input filename is: $filename\n";

#open(my $fh, '<:encoding(UTF-8)', $filename)

 # or die "Could not open file '$filename' $!";

#foreach(<$fh>)

foreach(<>)

{

 chomp;

 #open(FILEHANDLE,"out/sales-out/outfile.txt");

 ($store,$sale) = (split(/\s+/,$_))[2,4];

 print "$store\t$sale\n";

 #print "{0}\t{1}".format($store,$sale);

}

And the command for it is,


hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
salesData/sales.txt -output out/sales-out -mapper
perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
salesData/salesFilter.txt


May you guys give suggestions?


Thanks,

Dingcheng

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Thanks for your quick response. It seems to make sense that I should put
the resource file and script into the same directory. Sigh, I cannot test
it now since our hadoop environment is down for maintenance this week. I
will keep you posted if this will work.

Thanks a lot,
Dingcheng

On Sun, Dec 6, 2015 at 6:12 PM, David Morel <dm...@amakuru.net> wrote:

> On 7 Dec 2015, at 0:21, Dingcheng Li wrote:
>
> Without it, it works well after I comment the script to create and read the
>> resource file. For python, exactly the same file structure, it works. I do
>> not think that the resource file ("salesData/salesFilter.txt") should be
>> in
>> HDFS directory since the resource file is like a dictionary which I use to
>> filter words from the input file.
>> Thanks,
>> Dingcheng
>>
>
> So I checked the docs and the file is copied to the workdir, but not in a
> subdir, like I said.
> So removing the subdirectories in your perl script should work, as the
> data file should be in the same directory as your script.
>
> example taken from https://wiki.apache.org/hadoop/HadoopStreaming
>
> Example: $HSTREAMING -mapper "/usr/local/bin/perl5 filter.pl"
>            -file /local/filter.pl -input "/logs/0604*/*" [...]
>   Ships a script, invokes the non-shipped perl interpreter
>   Shipped files go to the working directory so filter.pl is found by perl
> ...
>
>
>
>> On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:
>>
>> Your file would probably not be located in a subdirectory in HDFS. Try
>>> without it ?
>>> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>>>
>>> Hi, folks,
>>>>
>>>> I am using hadoop streaming to call perl scripts as mapper. Things are
>>>> working well. But I found that the resource file reading is a problem.
>>>>
>>>> Basically I think that I am on the right track, -file option is the
>>>> correct way to get resource file read. I tested on python script. But
>>>> for
>>>> perl, it always gives the file not found error. I noticed that in python
>>>> “import sys” is sued. I am not sure what is needed for perl. I have a
>>>> simple test code as follows (use Sys not working),
>>>>
>>>>
>>>> #!/usr/bin/perl
>>>>
>>>> my $filter_file = "salesData/salesFilter.txt";
>>>>
>>>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>>>
>>>> #my $filename = $0;
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>>
>>>> #my $filename = $ENV{"map_input_file"};
>>>>
>>>> my $filename = $ENV{"mapreduce_map_input_file"};
>>>>
>>>> #mapreduce_map_input_file
>>>>
>>>> print STDERR "Input filename is: $filename\n";
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>> #foreach(<$fh>)
>>>>
>>>> foreach(<>)
>>>>
>>>> {
>>>>
>>>> chomp;
>>>>
>>>> #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>>>
>>>> ($store,$sale) = (split(/\s+/,$_))[2,4];
>>>>
>>>> print "$store\t$sale\n";
>>>>
>>>> #print "{0}\t{1}".format($store,$sale);
>>>>
>>>> }
>>>>
>>>> And the command for it is,
>>>>
>>>>
>>>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>>>> salesData/sales.txt -output out/sales-out -mapper
>>>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>>>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>>>> salesData/salesFilter.txt
>>>>
>>>>
>>>> May you guys give suggestions?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Dingcheng
>>>>
>>>>
>>>

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Thanks for your quick response. It seems to make sense that I should put
the resource file and script into the same directory. Sigh, I cannot test
it now since our hadoop environment is down for maintenance this week. I
will keep you posted if this will work.

Thanks a lot,
Dingcheng

On Sun, Dec 6, 2015 at 6:12 PM, David Morel <dm...@amakuru.net> wrote:

> On 7 Dec 2015, at 0:21, Dingcheng Li wrote:
>
> Without it, it works well after I comment the script to create and read the
>> resource file. For python, exactly the same file structure, it works. I do
>> not think that the resource file ("salesData/salesFilter.txt") should be
>> in
>> HDFS directory since the resource file is like a dictionary which I use to
>> filter words from the input file.
>> Thanks,
>> Dingcheng
>>
>
> So I checked the docs and the file is copied to the workdir, but not in a
> subdir, like I said.
> So removing the subdirectories in your perl script should work, as the
> data file should be in the same directory as your script.
>
> example taken from https://wiki.apache.org/hadoop/HadoopStreaming
>
> Example: $HSTREAMING -mapper "/usr/local/bin/perl5 filter.pl"
>            -file /local/filter.pl -input "/logs/0604*/*" [...]
>   Ships a script, invokes the non-shipped perl interpreter
>   Shipped files go to the working directory so filter.pl is found by perl
> ...
>
>
>
>> On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:
>>
>> Your file would probably not be located in a subdirectory in HDFS. Try
>>> without it ?
>>> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>>>
>>> Hi, folks,
>>>>
>>>> I am using hadoop streaming to call perl scripts as mapper. Things are
>>>> working well. But I found that the resource file reading is a problem.
>>>>
>>>> Basically I think that I am on the right track, -file option is the
>>>> correct way to get resource file read. I tested on python script. But
>>>> for
>>>> perl, it always gives the file not found error. I noticed that in python
>>>> “import sys” is sued. I am not sure what is needed for perl. I have a
>>>> simple test code as follows (use Sys not working),
>>>>
>>>>
>>>> #!/usr/bin/perl
>>>>
>>>> my $filter_file = "salesData/salesFilter.txt";
>>>>
>>>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>>>
>>>> #my $filename = $0;
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>>
>>>> #my $filename = $ENV{"map_input_file"};
>>>>
>>>> my $filename = $ENV{"mapreduce_map_input_file"};
>>>>
>>>> #mapreduce_map_input_file
>>>>
>>>> print STDERR "Input filename is: $filename\n";
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>> #foreach(<$fh>)
>>>>
>>>> foreach(<>)
>>>>
>>>> {
>>>>
>>>> chomp;
>>>>
>>>> #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>>>
>>>> ($store,$sale) = (split(/\s+/,$_))[2,4];
>>>>
>>>> print "$store\t$sale\n";
>>>>
>>>> #print "{0}\t{1}".format($store,$sale);
>>>>
>>>> }
>>>>
>>>> And the command for it is,
>>>>
>>>>
>>>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>>>> salesData/sales.txt -output out/sales-out -mapper
>>>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>>>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>>>> salesData/salesFilter.txt
>>>>
>>>>
>>>> May you guys give suggestions?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Dingcheng
>>>>
>>>>
>>>

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Thanks for your quick response. It seems to make sense that I should put
the resource file and script into the same directory. Sigh, I cannot test
it now since our hadoop environment is down for maintenance this week. I
will keep you posted if this will work.

Thanks a lot,
Dingcheng

On Sun, Dec 6, 2015 at 6:12 PM, David Morel <dm...@amakuru.net> wrote:

> On 7 Dec 2015, at 0:21, Dingcheng Li wrote:
>
> Without it, it works well after I comment the script to create and read the
>> resource file. For python, exactly the same file structure, it works. I do
>> not think that the resource file ("salesData/salesFilter.txt") should be
>> in
>> HDFS directory since the resource file is like a dictionary which I use to
>> filter words from the input file.
>> Thanks,
>> Dingcheng
>>
>
> So I checked the docs and the file is copied to the workdir, but not in a
> subdir, like I said.
> So removing the subdirectories in your perl script should work, as the
> data file should be in the same directory as your script.
>
> example taken from https://wiki.apache.org/hadoop/HadoopStreaming
>
> Example: $HSTREAMING -mapper "/usr/local/bin/perl5 filter.pl"
>            -file /local/filter.pl -input "/logs/0604*/*" [...]
>   Ships a script, invokes the non-shipped perl interpreter
>   Shipped files go to the working directory so filter.pl is found by perl
> ...
>
>
>
>> On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:
>>
>> Your file would probably not be located in a subdirectory in HDFS. Try
>>> without it ?
>>> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>>>
>>> Hi, folks,
>>>>
>>>> I am using hadoop streaming to call perl scripts as mapper. Things are
>>>> working well. But I found that the resource file reading is a problem.
>>>>
>>>> Basically I think that I am on the right track, -file option is the
>>>> correct way to get resource file read. I tested on python script. But
>>>> for
>>>> perl, it always gives the file not found error. I noticed that in python
>>>> “import sys” is sued. I am not sure what is needed for perl. I have a
>>>> simple test code as follows (use Sys not working),
>>>>
>>>>
>>>> #!/usr/bin/perl
>>>>
>>>> my $filter_file = "salesData/salesFilter.txt";
>>>>
>>>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>>>
>>>> #my $filename = $0;
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>>
>>>> #my $filename = $ENV{"map_input_file"};
>>>>
>>>> my $filename = $ENV{"mapreduce_map_input_file"};
>>>>
>>>> #mapreduce_map_input_file
>>>>
>>>> print STDERR "Input filename is: $filename\n";
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>> #foreach(<$fh>)
>>>>
>>>> foreach(<>)
>>>>
>>>> {
>>>>
>>>> chomp;
>>>>
>>>> #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>>>
>>>> ($store,$sale) = (split(/\s+/,$_))[2,4];
>>>>
>>>> print "$store\t$sale\n";
>>>>
>>>> #print "{0}\t{1}".format($store,$sale);
>>>>
>>>> }
>>>>
>>>> And the command for it is,
>>>>
>>>>
>>>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>>>> salesData/sales.txt -output out/sales-out -mapper
>>>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>>>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>>>> salesData/salesFilter.txt
>>>>
>>>>
>>>> May you guys give suggestions?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Dingcheng
>>>>
>>>>
>>>

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Thanks for your quick response. It seems to make sense that I should put
the resource file and script into the same directory. Sigh, I cannot test
it now since our hadoop environment is down for maintenance this week. I
will keep you posted if this will work.

Thanks a lot,
Dingcheng

On Sun, Dec 6, 2015 at 6:12 PM, David Morel <dm...@amakuru.net> wrote:

> On 7 Dec 2015, at 0:21, Dingcheng Li wrote:
>
> Without it, it works well after I comment the script to create and read the
>> resource file. For python, exactly the same file structure, it works. I do
>> not think that the resource file ("salesData/salesFilter.txt") should be
>> in
>> HDFS directory since the resource file is like a dictionary which I use to
>> filter words from the input file.
>> Thanks,
>> Dingcheng
>>
>
> So I checked the docs and the file is copied to the workdir, but not in a
> subdir, like I said.
> So removing the subdirectories in your perl script should work, as the
> data file should be in the same directory as your script.
>
> example taken from https://wiki.apache.org/hadoop/HadoopStreaming
>
> Example: $HSTREAMING -mapper "/usr/local/bin/perl5 filter.pl"
>            -file /local/filter.pl -input "/logs/0604*/*" [...]
>   Ships a script, invokes the non-shipped perl interpreter
>   Shipped files go to the working directory so filter.pl is found by perl
> ...
>
>
>
>> On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:
>>
>> Your file would probably not be located in a subdirectory in HDFS. Try
>>> without it ?
>>> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>>>
>>> Hi, folks,
>>>>
>>>> I am using hadoop streaming to call perl scripts as mapper. Things are
>>>> working well. But I found that the resource file reading is a problem.
>>>>
>>>> Basically I think that I am on the right track, -file option is the
>>>> correct way to get resource file read. I tested on python script. But
>>>> for
>>>> perl, it always gives the file not found error. I noticed that in python
>>>> “import sys” is sued. I am not sure what is needed for perl. I have a
>>>> simple test code as follows (use Sys not working),
>>>>
>>>>
>>>> #!/usr/bin/perl
>>>>
>>>> my $filter_file = "salesData/salesFilter.txt";
>>>>
>>>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>>>
>>>> #my $filename = $0;
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>>
>>>> #my $filename = $ENV{"map_input_file"};
>>>>
>>>> my $filename = $ENV{"mapreduce_map_input_file"};
>>>>
>>>> #mapreduce_map_input_file
>>>>
>>>> print STDERR "Input filename is: $filename\n";
>>>>
>>>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>>>
>>>> # or die "Could not open file '$filename' $!";
>>>>
>>>> #foreach(<$fh>)
>>>>
>>>> foreach(<>)
>>>>
>>>> {
>>>>
>>>> chomp;
>>>>
>>>> #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>>>
>>>> ($store,$sale) = (split(/\s+/,$_))[2,4];
>>>>
>>>> print "$store\t$sale\n";
>>>>
>>>> #print "{0}\t{1}".format($store,$sale);
>>>>
>>>> }
>>>>
>>>> And the command for it is,
>>>>
>>>>
>>>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>>>> salesData/sales.txt -output out/sales-out -mapper
>>>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>>>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>>>> salesData/salesFilter.txt
>>>>
>>>>
>>>> May you guys give suggestions?
>>>>
>>>>
>>>> Thanks,
>>>>
>>>> Dingcheng
>>>>
>>>>
>>>

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Without it, it works well after I comment the script to create and read the
resource file. For python, exactly the same file structure, it works. I do
not think that the resource file ("salesData/salesFilter.txt") should be in
HDFS directory since the resource file is like a dictionary which I use to
filter words from the input file.
Thanks,
Dingcheng

On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:

> Your file would probably not be located in a subdirectory in HDFS. Try
> without it ?
> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>
>> Hi, folks,
>>
>> I am using hadoop streaming to call perl scripts as mapper. Things are
>> working well. But I found that the resource file reading is a problem.
>>
>> Basically I think that I am on the right track, -file option is the
>> correct way to get resource file read. I tested on python script. But for
>> perl, it always gives the file not found error. I noticed that in python
>> “import sys” is sued. I am not sure what is needed for perl. I have a
>> simple test code as follows (use Sys not working),
>>
>>
>> #!/usr/bin/perl
>>
>> my $filter_file = "salesData/salesFilter.txt";
>>
>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>
>> #my $filename = $0;
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>>
>> #my $filename = $ENV{"map_input_file"};
>>
>> my $filename = $ENV{"mapreduce_map_input_file"};
>>
>> #mapreduce_map_input_file
>>
>> print STDERR "Input filename is: $filename\n";
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>> #foreach(<$fh>)
>>
>> foreach(<>)
>>
>> {
>>
>>  chomp;
>>
>>  #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>
>>  ($store,$sale) = (split(/\s+/,$_))[2,4];
>>
>>  print "$store\t$sale\n";
>>
>>  #print "{0}\t{1}".format($store,$sale);
>>
>> }
>>
>> And the command for it is,
>>
>>
>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>> salesData/sales.txt -output out/sales-out -mapper
>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>> salesData/salesFilter.txt
>>
>>
>> May you guys give suggestions?
>>
>>
>> Thanks,
>>
>> Dingcheng
>>
>

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Without it, it works well after I comment the script to create and read the
resource file. For python, exactly the same file structure, it works. I do
not think that the resource file ("salesData/salesFilter.txt") should be in
HDFS directory since the resource file is like a dictionary which I use to
filter words from the input file.
Thanks,
Dingcheng

On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:

> Your file would probably not be located in a subdirectory in HDFS. Try
> without it ?
> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>
>> Hi, folks,
>>
>> I am using hadoop streaming to call perl scripts as mapper. Things are
>> working well. But I found that the resource file reading is a problem.
>>
>> Basically I think that I am on the right track, -file option is the
>> correct way to get resource file read. I tested on python script. But for
>> perl, it always gives the file not found error. I noticed that in python
>> “import sys” is sued. I am not sure what is needed for perl. I have a
>> simple test code as follows (use Sys not working),
>>
>>
>> #!/usr/bin/perl
>>
>> my $filter_file = "salesData/salesFilter.txt";
>>
>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>
>> #my $filename = $0;
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>>
>> #my $filename = $ENV{"map_input_file"};
>>
>> my $filename = $ENV{"mapreduce_map_input_file"};
>>
>> #mapreduce_map_input_file
>>
>> print STDERR "Input filename is: $filename\n";
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>> #foreach(<$fh>)
>>
>> foreach(<>)
>>
>> {
>>
>>  chomp;
>>
>>  #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>
>>  ($store,$sale) = (split(/\s+/,$_))[2,4];
>>
>>  print "$store\t$sale\n";
>>
>>  #print "{0}\t{1}".format($store,$sale);
>>
>> }
>>
>> And the command for it is,
>>
>>
>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>> salesData/sales.txt -output out/sales-out -mapper
>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>> salesData/salesFilter.txt
>>
>>
>> May you guys give suggestions?
>>
>>
>> Thanks,
>>
>> Dingcheng
>>
>

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Without it, it works well after I comment the script to create and read the
resource file. For python, exactly the same file structure, it works. I do
not think that the resource file ("salesData/salesFilter.txt") should be in
HDFS directory since the resource file is like a dictionary which I use to
filter words from the input file.
Thanks,
Dingcheng

On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:

> Your file would probably not be located in a subdirectory in HDFS. Try
> without it ?
> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>
>> Hi, folks,
>>
>> I am using hadoop streaming to call perl scripts as mapper. Things are
>> working well. But I found that the resource file reading is a problem.
>>
>> Basically I think that I am on the right track, -file option is the
>> correct way to get resource file read. I tested on python script. But for
>> perl, it always gives the file not found error. I noticed that in python
>> “import sys” is sued. I am not sure what is needed for perl. I have a
>> simple test code as follows (use Sys not working),
>>
>>
>> #!/usr/bin/perl
>>
>> my $filter_file = "salesData/salesFilter.txt";
>>
>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>
>> #my $filename = $0;
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>>
>> #my $filename = $ENV{"map_input_file"};
>>
>> my $filename = $ENV{"mapreduce_map_input_file"};
>>
>> #mapreduce_map_input_file
>>
>> print STDERR "Input filename is: $filename\n";
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>> #foreach(<$fh>)
>>
>> foreach(<>)
>>
>> {
>>
>>  chomp;
>>
>>  #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>
>>  ($store,$sale) = (split(/\s+/,$_))[2,4];
>>
>>  print "$store\t$sale\n";
>>
>>  #print "{0}\t{1}".format($store,$sale);
>>
>> }
>>
>> And the command for it is,
>>
>>
>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>> salesData/sales.txt -output out/sales-out -mapper
>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>> salesData/salesFilter.txt
>>
>>
>> May you guys give suggestions?
>>
>>
>> Thanks,
>>
>> Dingcheng
>>
>

Re: Help on perl streaming

Posted by Dingcheng Li <di...@gmail.com>.
Without it, it works well after I comment the script to create and read the
resource file. For python, exactly the same file structure, it works. I do
not think that the resource file ("salesData/salesFilter.txt") should be in
HDFS directory since the resource file is like a dictionary which I use to
filter words from the input file.
Thanks,
Dingcheng

On Sun, Dec 6, 2015 at 5:13 PM, David Morel <dm...@amakuru.net> wrote:

> Your file would probably not be located in a subdirectory in HDFS. Try
> without it ?
> Le 6 déc. 2015 10:46 PM, "Dingcheng Li" <di...@gmail.com> a écrit :
>
>> Hi, folks,
>>
>> I am using hadoop streaming to call perl scripts as mapper. Things are
>> working well. But I found that the resource file reading is a problem.
>>
>> Basically I think that I am on the right track, -file option is the
>> correct way to get resource file read. I tested on python script. But for
>> perl, it always gives the file not found error. I noticed that in python
>> “import sys” is sued. I am not sure what is needed for perl. I have a
>> simple test code as follows (use Sys not working),
>>
>>
>> #!/usr/bin/perl
>>
>> my $filter_file = "salesData/salesFilter.txt";
>>
>> open(FH, $filter_file) or die "Could not open file '$filter_file' $!";
>>
>> #my $filename = $0;
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>>
>> #my $filename = $ENV{"map_input_file"};
>>
>> my $filename = $ENV{"mapreduce_map_input_file"};
>>
>> #mapreduce_map_input_file
>>
>> print STDERR "Input filename is: $filename\n";
>>
>> #open(my $fh, '<:encoding(UTF-8)', $filename)
>>
>>  # or die "Could not open file '$filename' $!";
>>
>> #foreach(<$fh>)
>>
>> foreach(<>)
>>
>> {
>>
>>  chomp;
>>
>>  #open(FILEHANDLE,"out/sales-out/outfile.txt");
>>
>>  ($store,$sale) = (split(/\s+/,$_))[2,4];
>>
>>  print "$store\t$sale\n";
>>
>>  #print "{0}\t{1}".format($store,$sale);
>>
>> }
>>
>> And the command for it is,
>>
>>
>> hadoop jar /usr/lib/hadoop-mapreduce/hadoop-streaming.jar -input
>> salesData/sales.txt -output out/sales-out -mapper
>> perlScripts/salesMapper.pl -file perlScripts/salesMapper.pl -reducer
>> perlScripts/salesReducer.pl -file perlScripts/salesReducer.pl -file
>> salesData/salesFilter.txt
>>
>>
>> May you guys give suggestions?
>>
>>
>> Thanks,
>>
>> Dingcheng
>>
>