You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-user@hadoop.apache.org by Jason Venner <ja...@attributor.com> on 2007/12/12 22:19:28 UTC

Question on Critical Region size for SequenceFile next/write - 0.15.1

We have relatively heavy weight objects that we pass around the cluster 
for our map/reduce tasks.
We have noticed that when we are using the multi threaded mapper, we 
don't get very high utilization of either cpu or disk.

On investigating, we discovered that the entirety of the next(key,value) 
and the entirety of the write( key, value) are synchronized on the file 
object.

This causes all threads to back up on the serialization/deserialization.

 Before we start coding, are there any current patches floating around 
the shrink this critical window? It is pretty straight forward for 
write, but not so simple for next.

We run multithreaded mappers because we have more cpu's than disk arms 
on our cluster machines, and some of our tasks are inherently threaded 
so we can't just set the maximum task number.

Thanks -- Jason

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Posted by Jason Venner <ja...@attributor.com>.
Our first cut at this is generating about 4x the IO, we are now 
saturating on disk.

These results are not definitive, they are eyeball.


Jason Venner wrote:
> I have the write side working, the read side seems to be more complex 
> and I am digging into it.
>
> Doug Cutting wrote:
>> Ted Dunning wrote:
>>> It seems reasonable that (de)-serialization could be done in threaded
>>> fashion and then just block on the (read) write itself.
>>
>> That would require a buffer per thread, e.g., replacing Writer#buffer 
>> with a ThreadLocal of DataOutputBuffers.  The deflater-related 
>> objects would also need to accessed through ThreadLocals.  That could 
>> work.
>>
>> Doug

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Posted by Jason Venner <ja...@attributor.com>.
I have the write side working, the read side seems to be more complex 
and I am digging into it.

Doug Cutting wrote:
> Ted Dunning wrote:
>> It seems reasonable that (de)-serialization could be done in threaded
>> fashion and then just block on the (read) write itself.
>
> That would require a buffer per thread, e.g., replacing Writer#buffer 
> with a ThreadLocal of DataOutputBuffers.  The deflater-related objects 
> would also need to accessed through ThreadLocals.  That could work.
>
> Doug

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Posted by Doug Cutting <cu...@apache.org>.
Ted Dunning wrote:
> It seems reasonable that (de)-serialization could be done in threaded
> fashion and then just block on the (read) write itself.

That would require a buffer per thread, e.g., replacing Writer#buffer 
with a ThreadLocal of DataOutputBuffers.  The deflater-related objects 
would also need to accessed through ThreadLocals.  That could work.

Doug

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Posted by Ted Dunning <td...@veoh.com>.

It seems reasonable that (de)-serialization could be done in threaded
fashion and then just block on the (read) write itself.

That would explain the utilization which is suspect is close to 1/N where N
is the number of processors.


On 12/12/07 2:07 PM, "Jason Venner" <ja...@attributor.com> wrote:

> Our theory is that the serialization time (not the disk write time) and
> the deserialization time (not the disk read time) is the bottleneck.
> I have some test code nearly ready to go, if it changes the machine
> utilization on my standard job, I will let you know...


Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Posted by Jason Venner <ja...@attributor.com>.
We have been monitoring the performance of our jobs using slaves.sh vmstat 5

When we are running the very simple mappers, that basically read input, 
do very very little and write output, neither the cpu or the disk are 
being fully utilized. We expect to saturate on either cpu or on disk. It 
may be we are saturating on network, but our network read speed is about 
the same as our disk read speed ~50mb/sec.
We only see about 1/5 of the disk bandwidth and 1/5 of the cpu being 
utilized, and increasing the number of threads doesn't change the 
utilization.

Our theory is that the serialization time (not the disk write time) and 
the deserialization time (not the disk read time) is the bottleneck.
I have some test code nearly ready to go, if it changes the machine 
utilization on my standard job, I will let you know...


Doug Cutting wrote:
> Jason Venner wrote:
>> On investigating, we discovered that the entirety of the 
>> next(key,value) and the entirety of the write( key, value) are 
>> synchronized on the file object.
>>
>> This causes all threads to back up on the serialization/deserialization.
>
> I'm not sure what you want to happen here.  If you've got a bunch of 
> threads writing to a single file, and that's your performance 
> bottleneck, I don't see how to improve the situation except to write 
> to multiple files on different drives, or to spread your load across a 
> larger cluster (another way to get more drives).
>
> Doug

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Posted by Doug Cutting <cu...@apache.org>.
Jason Venner wrote:
> On investigating, we discovered that the entirety of the next(key,value) 
> and the entirety of the write( key, value) are synchronized on the file 
> object.
> 
> This causes all threads to back up on the serialization/deserialization.

I'm not sure what you want to happen here.  If you've got a bunch of 
threads writing to a single file, and that's your performance 
bottleneck, I don't see how to improve the situation except to write to 
multiple files on different drives, or to spread your load across a 
larger cluster (another way to get more drives).

Doug