You are viewing a plain text version of this content. The canonical link for it is here.

Posted to hdfs-user@hadoop.apache.org by "Balachandar R.A." <ba...@gmail.com> on 2013/05/09 08:53:17 UTC

one new bie question

Hello

I would like to see the possibility of using map reduce framework for my
following problem.

I have a set of huge files. I would like to execute a binary over every
input files. The binary needs to operate over the whole file and hence it
is not possible to split the file in chunks. Let’s assume that I have six
such files and have their names in a single text file. I need to write
hadoop code to take this single file as input and every line in it should
go to one map task. The map tasks shall execute the binary on this file and
the file can be located in hdfs. No reduce tasks is needed and no output
shall be emitted from the map tasks as well. The binary take care of
creating output file in the specified location.
Is there a way to tell hadoop to feed single line to a map task? I came
across few examples wherein a set of files has been given and looks like
the framework try to split the file, reads every line in the split,
generates key/value pairs and send this pairs to single map task. In my
situation, I want only one key value pair should be generated for one line
and it should be given to a single map task. Thats it?

For ex. Assume that this is my file <input.txt>

myFirstInput.vlc
mySecondInput.vlc
myThirdInput.vlc

Now, first map task should get a pair <1, myFirstInput.vlc>, the second
gets a pair <2, mySecondInput.vlc> and so on.

Can someone throw some light in to this problem? For me, it looks
straightforward but could not find any pointers in the web.







With thanks and regards
Balson

Re: one new bie question

Posted by "Balachandar R.A." <ba...@gmail.com>.

Wow,

Thats exactly what I want.

Thanks a lot

Balson
On 9 May 2013 13:16, "Ted Xu" <tx...@gopivotal.com> wrote:

> Hi Balson,
>
> Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
> You can find example of NLineInputFormat here: http://goo.gl/aVzDr.
>
>
> On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A. <balachandar.ra@gmail.com
> > wrote:
>
>>
>> Hello
>>
>> I would like to see the possibility of using map reduce framework for my
>> following problem.
>>
>> I have a set of huge files. I would like to execute a binary over every
>> input files. The binary needs to operate over the whole file and hence it
>> is not possible to split the file in chunks. Let’s assume that I have six
>> such files and have their names in a single text file. I need to write
>> hadoop code to take this single file as input and every line in it should
>> go to one map task. The map tasks shall execute the binary on this file and
>> the file can be located in hdfs. No reduce tasks is needed and no output
>> shall be emitted from the map tasks as well. The binary take care of
>> creating output file in the specified location.
>> Is there a way to tell hadoop to feed single line to a map task? I came
>> across few examples wherein a set of files has been given and looks like
>> the framework try to split the file, reads every line in the split,
>> generates key/value pairs and send this pairs to single map task. In my
>> situation, I want only one key value pair should be generated for one line
>> and it should be given to a single map task. Thats it?
>>
>> For ex. Assume that this is my file <input.txt>
>>
>> myFirstInput.vlc
>> mySecondInput.vlc
>> myThirdInput.vlc
>>
>> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
>> gets a pair <2, mySecondInput.vlc> and so on.
>>
>> Can someone throw some light in to this problem? For me, it looks
>> straightforward but could not find any pointers in the web.
>>
>>
>>
>>
>>
>>
>>
>> With thanks and regards
>> Balson
>>
>>
>>
>>
>
>
>
> --
> Regards,
> Ted Xu
>

Re: one new bie question

Posted by "Balachandar R.A." <ba...@gmail.com>.

Wow,

Thats exactly what I want.

Thanks a lot

Balson
On 9 May 2013 13:16, "Ted Xu" <tx...@gopivotal.com> wrote:

> Hi Balson,
>
> Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
> You can find example of NLineInputFormat here: http://goo.gl/aVzDr.
>
>
> On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A. <balachandar.ra@gmail.com
> > wrote:
>
>>
>> Hello
>>
>> I would like to see the possibility of using map reduce framework for my
>> following problem.
>>
>> I have a set of huge files. I would like to execute a binary over every
>> input files. The binary needs to operate over the whole file and hence it
>> is not possible to split the file in chunks. Let’s assume that I have six
>> such files and have their names in a single text file. I need to write
>> hadoop code to take this single file as input and every line in it should
>> go to one map task. The map tasks shall execute the binary on this file and
>> the file can be located in hdfs. No reduce tasks is needed and no output
>> shall be emitted from the map tasks as well. The binary take care of
>> creating output file in the specified location.
>> Is there a way to tell hadoop to feed single line to a map task? I came
>> across few examples wherein a set of files has been given and looks like
>> the framework try to split the file, reads every line in the split,
>> generates key/value pairs and send this pairs to single map task. In my
>> situation, I want only one key value pair should be generated for one line
>> and it should be given to a single map task. Thats it?
>>
>> For ex. Assume that this is my file <input.txt>
>>
>> myFirstInput.vlc
>> mySecondInput.vlc
>> myThirdInput.vlc
>>
>> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
>> gets a pair <2, mySecondInput.vlc> and so on.
>>
>> Can someone throw some light in to this problem? For me, it looks
>> straightforward but could not find any pointers in the web.
>>
>>
>>
>>
>>
>>
>>
>> With thanks and regards
>> Balson
>>
>>
>>
>>
>
>
>
> --
> Regards,
> Ted Xu
>

Re: one new bie question

Posted by "Balachandar R.A." <ba...@gmail.com>.

Wow,

Thats exactly what I want.

Thanks a lot

Balson
On 9 May 2013 13:16, "Ted Xu" <tx...@gopivotal.com> wrote:

> Hi Balson,
>
> Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
> You can find example of NLineInputFormat here: http://goo.gl/aVzDr.
>
>
> On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A. <balachandar.ra@gmail.com
> > wrote:
>
>>
>> Hello
>>
>> I would like to see the possibility of using map reduce framework for my
>> following problem.
>>
>> I have a set of huge files. I would like to execute a binary over every
>> input files. The binary needs to operate over the whole file and hence it
>> is not possible to split the file in chunks. Let’s assume that I have six
>> such files and have their names in a single text file. I need to write
>> hadoop code to take this single file as input and every line in it should
>> go to one map task. The map tasks shall execute the binary on this file and
>> the file can be located in hdfs. No reduce tasks is needed and no output
>> shall be emitted from the map tasks as well. The binary take care of
>> creating output file in the specified location.
>> Is there a way to tell hadoop to feed single line to a map task? I came
>> across few examples wherein a set of files has been given and looks like
>> the framework try to split the file, reads every line in the split,
>> generates key/value pairs and send this pairs to single map task. In my
>> situation, I want only one key value pair should be generated for one line
>> and it should be given to a single map task. Thats it?
>>
>> For ex. Assume that this is my file <input.txt>
>>
>> myFirstInput.vlc
>> mySecondInput.vlc
>> myThirdInput.vlc
>>
>> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
>> gets a pair <2, mySecondInput.vlc> and so on.
>>
>> Can someone throw some light in to this problem? For me, it looks
>> straightforward but could not find any pointers in the web.
>>
>>
>>
>>
>>
>>
>>
>> With thanks and regards
>> Balson
>>
>>
>>
>>
>
>
>
> --
> Regards,
> Ted Xu
>

Re: one new bie question

Posted by "Balachandar R.A." <ba...@gmail.com>.

Wow,

Thats exactly what I want.

Thanks a lot

Balson
On 9 May 2013 13:16, "Ted Xu" <tx...@gopivotal.com> wrote:

> Hi Balson,
>
> Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
> You can find example of NLineInputFormat here: http://goo.gl/aVzDr.
>
>
> On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A. <balachandar.ra@gmail.com
> > wrote:
>
>>
>> Hello
>>
>> I would like to see the possibility of using map reduce framework for my
>> following problem.
>>
>> I have a set of huge files. I would like to execute a binary over every
>> input files. The binary needs to operate over the whole file and hence it
>> is not possible to split the file in chunks. Let’s assume that I have six
>> such files and have their names in a single text file. I need to write
>> hadoop code to take this single file as input and every line in it should
>> go to one map task. The map tasks shall execute the binary on this file and
>> the file can be located in hdfs. No reduce tasks is needed and no output
>> shall be emitted from the map tasks as well. The binary take care of
>> creating output file in the specified location.
>> Is there a way to tell hadoop to feed single line to a map task? I came
>> across few examples wherein a set of files has been given and looks like
>> the framework try to split the file, reads every line in the split,
>> generates key/value pairs and send this pairs to single map task. In my
>> situation, I want only one key value pair should be generated for one line
>> and it should be given to a single map task. Thats it?
>>
>> For ex. Assume that this is my file <input.txt>
>>
>> myFirstInput.vlc
>> mySecondInput.vlc
>> myThirdInput.vlc
>>
>> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
>> gets a pair <2, mySecondInput.vlc> and so on.
>>
>> Can someone throw some light in to this problem? For me, it looks
>> straightforward but could not find any pointers in the web.
>>
>>
>>
>>
>>
>>
>>
>> With thanks and regards
>> Balson
>>
>>
>>
>>
>
>
>
> --
> Regards,
> Ted Xu
>

Re: one new bie question

Posted by Ted Xu <tx...@gopivotal.com>.

Hi Balson,

Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
You can find example of NLineInputFormat here: http://goo.gl/aVzDr.


On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A.
<ba...@gmail.com>wrote:

>
> Hello
>
> I would like to see the possibility of using map reduce framework for my
> following problem.
>
> I have a set of huge files. I would like to execute a binary over every
> input files. The binary needs to operate over the whole file and hence it
> is not possible to split the file in chunks. Let’s assume that I have six
> such files and have their names in a single text file. I need to write
> hadoop code to take this single file as input and every line in it should
> go to one map task. The map tasks shall execute the binary on this file and
> the file can be located in hdfs. No reduce tasks is needed and no output
> shall be emitted from the map tasks as well. The binary take care of
> creating output file in the specified location.
> Is there a way to tell hadoop to feed single line to a map task? I came
> across few examples wherein a set of files has been given and looks like
> the framework try to split the file, reads every line in the split,
> generates key/value pairs and send this pairs to single map task. In my
> situation, I want only one key value pair should be generated for one line
> and it should be given to a single map task. Thats it?
>
> For ex. Assume that this is my file <input.txt>
>
> myFirstInput.vlc
> mySecondInput.vlc
> myThirdInput.vlc
>
> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
> gets a pair <2, mySecondInput.vlc> and so on.
>
> Can someone throw some light in to this problem? For me, it looks
> straightforward but could not find any pointers in the web.
>
>
>
>
>
>
>
> With thanks and regards
> Balson
>
>
>
>



-- 
Regards,
Ted Xu

Re: one new bie question

Posted by Ted Xu <tx...@gopivotal.com>.

Hi Balson,

Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
You can find example of NLineInputFormat here: http://goo.gl/aVzDr.


On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A.
<ba...@gmail.com>wrote:

>
> Hello
>
> I would like to see the possibility of using map reduce framework for my
> following problem.
>
> I have a set of huge files. I would like to execute a binary over every
> input files. The binary needs to operate over the whole file and hence it
> is not possible to split the file in chunks. Let’s assume that I have six
> such files and have their names in a single text file. I need to write
> hadoop code to take this single file as input and every line in it should
> go to one map task. The map tasks shall execute the binary on this file and
> the file can be located in hdfs. No reduce tasks is needed and no output
> shall be emitted from the map tasks as well. The binary take care of
> creating output file in the specified location.
> Is there a way to tell hadoop to feed single line to a map task? I came
> across few examples wherein a set of files has been given and looks like
> the framework try to split the file, reads every line in the split,
> generates key/value pairs and send this pairs to single map task. In my
> situation, I want only one key value pair should be generated for one line
> and it should be given to a single map task. Thats it?
>
> For ex. Assume that this is my file <input.txt>
>
> myFirstInput.vlc
> mySecondInput.vlc
> myThirdInput.vlc
>
> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
> gets a pair <2, mySecondInput.vlc> and so on.
>
> Can someone throw some light in to this problem? For me, it looks
> straightforward but could not find any pointers in the web.
>
>
>
>
>
>
>
> With thanks and regards
> Balson
>
>
>
>



-- 
Regards,
Ted Xu

Re: one new bie question

Posted by Ted Xu <tx...@gopivotal.com>.

Hi Balson,

Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
You can find example of NLineInputFormat here: http://goo.gl/aVzDr.


On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A.
<ba...@gmail.com>wrote:

>
> Hello
>
> I would like to see the possibility of using map reduce framework for my
> following problem.
>
> I have a set of huge files. I would like to execute a binary over every
> input files. The binary needs to operate over the whole file and hence it
> is not possible to split the file in chunks. Let’s assume that I have six
> such files and have their names in a single text file. I need to write
> hadoop code to take this single file as input and every line in it should
> go to one map task. The map tasks shall execute the binary on this file and
> the file can be located in hdfs. No reduce tasks is needed and no output
> shall be emitted from the map tasks as well. The binary take care of
> creating output file in the specified location.
> Is there a way to tell hadoop to feed single line to a map task? I came
> across few examples wherein a set of files has been given and looks like
> the framework try to split the file, reads every line in the split,
> generates key/value pairs and send this pairs to single map task. In my
> situation, I want only one key value pair should be generated for one line
> and it should be given to a single map task. Thats it?
>
> For ex. Assume that this is my file <input.txt>
>
> myFirstInput.vlc
> mySecondInput.vlc
> myThirdInput.vlc
>
> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
> gets a pair <2, mySecondInput.vlc> and so on.
>
> Can someone throw some light in to this problem? For me, it looks
> straightforward but could not find any pointers in the web.
>
>
>
>
>
>
>
> With thanks and regards
> Balson
>
>
>
>



-- 
Regards,
Ted Xu

Re: one new bie question

Posted by Ted Xu <tx...@gopivotal.com>.

Hi Balson,

Have you tried NLineInputFormat<http://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/lib/NLineInputFormat.html>?
You can find example of NLineInputFormat here: http://goo.gl/aVzDr.


On Thu, May 9, 2013 at 2:53 PM, Balachandar R.A.
<ba...@gmail.com>wrote:

>
> Hello
>
> I would like to see the possibility of using map reduce framework for my
> following problem.
>
> I have a set of huge files. I would like to execute a binary over every
> input files. The binary needs to operate over the whole file and hence it
> is not possible to split the file in chunks. Let’s assume that I have six
> such files and have their names in a single text file. I need to write
> hadoop code to take this single file as input and every line in it should
> go to one map task. The map tasks shall execute the binary on this file and
> the file can be located in hdfs. No reduce tasks is needed and no output
> shall be emitted from the map tasks as well. The binary take care of
> creating output file in the specified location.
> Is there a way to tell hadoop to feed single line to a map task? I came
> across few examples wherein a set of files has been given and looks like
> the framework try to split the file, reads every line in the split,
> generates key/value pairs and send this pairs to single map task. In my
> situation, I want only one key value pair should be generated for one line
> and it should be given to a single map task. Thats it?
>
> For ex. Assume that this is my file <input.txt>
>
> myFirstInput.vlc
> mySecondInput.vlc
> myThirdInput.vlc
>
> Now, first map task should get a pair <1, myFirstInput.vlc>, the second
> gets a pair <2, mySecondInput.vlc> and so on.
>
> Can someone throw some light in to this problem? For me, it looks
> straightforward but could not find any pointers in the web.
>
>
>
>
>
>
>
> With thanks and regards
> Balson
>
>
>
>



-- 
Regards,
Ted Xu