You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2010/10/19 22:27:55 UTC

Re: Throttling ColumnFamilyRecordReader

(Moving to user@.)

Isn't reducing the number of map tasks the easiest way to tune this?

Also: in 0.7 you can use NetworkTopologyStrategy to designate a group
of nodes as your hadoop "datacenter" so the workloads won't overlap.

On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores <mm...@real.com> wrote:
> Does it make sense to add some kind of throttle capability on the ColumnFamilyRecordReader for Hadoop?
>
> If I have 60 or so Map tasks running at the same time when the cluster is already heavily loaded with OLTP operations, I can get some decreased on-line performance
> that may not be acceptable.  (I'm loading an 8 node cluster with 2000 TPS.)  By default my cluster of 8 nodes (which are also the Hadoop JobTracker nodes) has 8 Map tasks per node making the get_range_slices call, based on what the ColumnFamilyInputFormat has calculated from my token ranges.
> I can increase the inputSplitSize  (ConfigHelper.setInputSplitSIze()) so that there
> is only one Map task per node, and this helps quite a bit.
>
> But is it reasonable to provide a configurable sleep to cause a wait in between smaller size range queries?  That would stretch out the Map time
> and let the OLTP processing be less affected.
>
>
> --Michael
>
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Throttling ColumnFamilyRecordReader

Posted by Michael Moores <mm...@real.com>.
sorry i had a misunderstanding of the MapRed report output.
i did reduce mapreduce.tasktracker.map.tasks.maximum (number of concurrent maps per node) from the default of 2 to 1.
i suppose if i want to do this on a per job/user basis i'll try out the hadoop fair scheduler.




On Oct 19, 2010, at 1:27 PM, Jonathan Ellis wrote:

> (Moving to user@.)
> 
> Isn't reducing the number of map tasks the easiest way to tune this?
> 
> Also: in 0.7 you can use NetworkTopologyStrategy to designate a group
> of nodes as your hadoop "datacenter" so the workloads won't overlap.
> 
> On Tue, Oct 19, 2010 at 3:22 PM, Michael Moores <mm...@real.com> wrote:
>> Does it make sense to add some kind of throttle capability on the ColumnFamilyRecordReader for Hadoop?
>> 
>> If I have 60 or so Map tasks running at the same time when the cluster is already heavily loaded with OLTP operations, I can get some decreased on-line performance
>> that may not be acceptable.  (I'm loading an 8 node cluster with 2000 TPS.)  By default my cluster of 8 nodes (which are also the Hadoop JobTracker nodes) has 8 Map tasks per node making the get_range_slices call, based on what the ColumnFamilyInputFormat has calculated from my token ranges.
>> I can increase the inputSplitSize  (ConfigHelper.setInputSplitSIze()) so that there
>> is only one Map task per node, and this helps quite a bit.
>> 
>> But is it reasonable to provide a configurable sleep to cause a wait in between smaller size range queries?  That would stretch out the Map time
>> and let the OLTP processing be less affected.
>> 
>> 
>> --Michael
>> 
>> 
>> 
> 
> 
> 
> -- 
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com