You are viewing a plain text version of this content. The canonical link for it is here.

Posted to common-user@hadoop.apache.org by Jun Young Kim <ju...@gmail.com> on 2011/03/12 11:21:32 UTC

is a single thread allocated to a single output file ?

hi,

is a single thread allocated to a single output file when a job is 
trying to write multiple output files?

if counts of output files are 10,000, does a hadoop try to create 
threads for each output file?

-- 
Junyoung Kim (juneng603@gmail.com)

Re: is a single thread allocated to a single output file ?

Posted by maha <ma...@umail.ucsb.edu>.

I found it :)

http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html

Maha

On Mar 15, 2011, at 2:18 PM, maha wrote:

> By the way, how do I know if my map task is single threaded (ie. one thread executing for each record ) ? and how to change that into multi-threading ?
> 
> Thank you,
> Maha
> 
> On Mar 12, 2011, at 9:11 PM, Harsh J wrote:
> 
>> Hello,
>> 
>> On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim <ju...@gmail.com> wrote:
>>> hi,
>>> 
>>> is a single thread allocated to a single output file when a job is trying to
>>> write multiple output files?
>> 
>> At the lower levels, a data streaming thread is indeed run for every
>> OutputStream created for writing on the DFS.
>> 
>> The map task is generally single threaded unless you multi-thread the
>> calls (in which case the record writers are still got in a
>> synchronized fashion).
>> 
>>> if counts of output files are 10,000, does a hadoop try to create threads
>>> for each output file?
>> 
>> Yes, there should be 10,000 threads 'started' for streaming writes
>> (but not all really working at the same time, as per the record writer
>> access methods in tasks).
>> 
>> Please correct me if I'm wrong.
>> 
>> -- 
>> Harsh J
>> www.harshj.com
>

Re: is a single thread allocated to a single output file ?

Posted by maha <ma...@umail.ucsb.edu>.

By the way, how do I know if my map task is single threaded (ie. one thread executing for each record ) ? and how to change that into multi-threading ?

Thank you,
Maha

On Mar 12, 2011, at 9:11 PM, Harsh J wrote:

> Hello,
> 
> On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim <ju...@gmail.com> wrote:
>> hi,
>> 
>> is a single thread allocated to a single output file when a job is trying to
>> write multiple output files?
> 
> At the lower levels, a data streaming thread is indeed run for every
> OutputStream created for writing on the DFS.
> 
> The map task is generally single threaded unless you multi-thread the
> calls (in which case the record writers are still got in a
> synchronized fashion).
> 
>> if counts of output files are 10,000, does a hadoop try to create threads
>> for each output file?
> 
> Yes, there should be 10,000 threads 'started' for streaming writes
> (but not all really working at the same time, as per the record writer
> access methods in tasks).
> 
> Please correct me if I'm wrong.
> 
> -- 
> Harsh J
> www.harshj.com

Re: is a single thread allocated to a single output file ?

Posted by Harsh J <qw...@gmail.com>.

Hello,

On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim <ju...@gmail.com> wrote:
> hi,
>
> is a single thread allocated to a single output file when a job is trying to
> write multiple output files?

At the lower levels, a data streaming thread is indeed run for every
OutputStream created for writing on the DFS.

The map task is generally single threaded unless you multi-thread the
calls (in which case the record writers are still got in a
synchronized fashion).

> if counts of output files are 10,000, does a hadoop try to create threads
> for each output file?

Yes, there should be 10,000 threads 'started' for streaming writes
(but not all really working at the same time, as per the record writer
access methods in tasks).

Please correct me if I'm wrong.

-- 
Harsh J
www.harshj.com