You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Hao Zheng <vo...@gmail.com> on 2008/03/25 13:49:32 UTC

some question about the nips paper

hi all devs,

I have read through the dev mail list, and have a rough idea of the
progress of Mahout. I have read the google paper and the nips paper.
As for the nips paper "map reduce for ML on Multicore", i have some
questions.

1. Sect. 4.1 Algorithm Time Complexity Analysis.
the paper assumes m >> n, that is, the training instances are much
larger than the features. its datasets do have very few features. but
this may not be true for many tasks, e.g. text classification, where
feature dimensions will reach 10^4-10^5. then will the analysis still
hold?

2. Sect. 4.1, too.
"reduce phase can minimize communication by combining data as it's
passed back; this accounts for the logP factor", Could you help me
figure out how logP is calculated.

3. Sect 5.4 Results and Discussion
"SVM gets about 13.6% speed up on average over 16 cores", it's 13.6%
or 13.6? From figure 2, it seems should be 13.6?

4. Sect 5.4, too.
"Finally, the above are runs on multiprocessor machines." No matter
multiprocess or multicore, it runs on a single machine which have a
share memory. But actually, M/R is for multi-machine, which will
involve much more cost on inter-machine communication. So the results
of the paper may be questionable?

Maybe some of the questions are a complete misinterpretation. Please
help me to get an full understanding of the paper, thanks.

Re: some question about the nips paper

Posted by Hao Zheng <vo...@gmail.com>.

If the cost is at each level, you are right. Maybe this is the exact
answer. But I am still not clear how to implement it. Could anyone
provide a more detailed description?

2008/3/28 Niranjan Balasubramanian <ni...@cs.umass.edu>:
> The cost is f(m,n) at each level in the tree (not at each node).
>  Thus, you have f(m,n) x the number of levels which gives f(m,n) log (P).
>
>  I might be way off here but this is what I guess.
>  ~ Niranjan
>
>
>
>
>
>  On Mar 27, 2008, at 12:08 PM, Hao Zheng wrote:
>
>  > let it be a tree of depth log(P), then the total communication cost
>  > will be
>  > f(m,n)*(1/2+1/4+1/8+...)*P = f(m,n)*P, but not log(P)
>  > do I see what you mean?
>  >
>  > 2008/3/27 Niranjan Balasubramanian <ni...@cs.umass.edu>:
>  >> On Mar 26, 2008, at 12:06 AM, Hao Zheng wrote:
>  >>
>  >>> it's log(P). I just don't know how log(P) is obtained.
>  >>>
>  >>
>  >>  I am doing some guess work here but I think the log (P) factor
>  >> arises
>  >>  out of the combining the data that is passed back from the various
>  >>  processors. Imagine tree of processors with depth log (P) and at
>  >> each
>  >>  level the communication amounts to some function of n and m, f(m,n),
>  >>  then the total communication cost would be
>  >>
>  >>  f(m,n) log (P).
>  >>
>  >>  ~ Niranjan
>  >>
>  >>
>  >>
>  >>> On 3/26/08, Grant Ingersoll <gs...@apache.org> wrote:
>  >>>>
>  >>>>  On Mar 25, 2008, at 4:11 PM, Isabel Drost wrote:
>  >>>>>
>  >>>>>
>  >>>>>> 2. Sect. 4.1, too.
>  >>>>>> "reduce phase can minimize communication by combining data as
>  >>>>>> it's
>  >>>>>> passed back; this accounts for the logP factor", Could you
>  >>>>>> help me
>  >>>>>> figure out how logP is calculated.
>  >>>>>
>  >>>>> Anyone else who can help out here?
>  >>>>
>  >>>>
>  >>>> Isn't this just log(P) where P is the number of cores?  The
>  >>>> version of
>  >>>>  the paper I have says log(P) not logP, so maybe there is a typo?
>  >>>>
>  >>>>   From earlier in 4.1:
>  >>>>  "We assume that the dimension of the inputs is n (i.e., x
>  >>>>  ∈R
>  >>>>  n
>  >>>>  ), that we have m
>  >>>>  training examples, and that there are P cores"
>  >>>>
>  >>>>
>  >>>>
>  >>>>  -Grant
>  >>>>
>  >>>>
>  >>>>
>  >>
>  >>
>
>

Re: some question about the nips paper

Posted by Niranjan Balasubramanian <ni...@cs.umass.edu>.

The cost is f(m,n) at each level in the tree (not at each node).  
Thus, you have f(m,n) x the number of levels which gives f(m,n) log (P).

I might be way off here but this is what I guess.
~ Niranjan



On Mar 27, 2008, at 12:08 PM, Hao Zheng wrote:

> let it be a tree of depth log(P), then the total communication cost  
> will be
> f(m,n)*(1/2+1/4+1/8+...)*P = f(m,n)*P, but not log(P)
> do I see what you mean?
>
> 2008/3/27 Niranjan Balasubramanian <ni...@cs.umass.edu>:
>> On Mar 26, 2008, at 12:06 AM, Hao Zheng wrote:
>>
>>> it's log(P). I just don't know how log(P) is obtained.
>>>
>>
>>  I am doing some guess work here but I think the log (P) factor  
>> arises
>>  out of the combining the data that is passed back from the various
>>  processors. Imagine tree of processors with depth log (P) and at  
>> each
>>  level the communication amounts to some function of n and m, f(m,n),
>>  then the total communication cost would be
>>
>>  f(m,n) log (P).
>>
>>  ~ Niranjan
>>
>>
>>
>>> On 3/26/08, Grant Ingersoll <gs...@apache.org> wrote:
>>>>
>>>>  On Mar 25, 2008, at 4:11 PM, Isabel Drost wrote:
>>>>>
>>>>>
>>>>>> 2. Sect. 4.1, too.
>>>>>> "reduce phase can minimize communication by combining data as  
>>>>>> it's
>>>>>> passed back; this accounts for the logP factor", Could you  
>>>>>> help me
>>>>>> figure out how logP is calculated.
>>>>>
>>>>> Anyone else who can help out here?
>>>>
>>>>
>>>> Isn't this just log(P) where P is the number of cores?  The
>>>> version of
>>>>  the paper I have says log(P) not logP, so maybe there is a typo?
>>>>
>>>>   From earlier in 4.1:
>>>>  "We assume that the dimension of the inputs is n (i.e., x
>>>>  ∈R
>>>>  n
>>>>  ), that we have m
>>>>  training examples, and that there are P cores"
>>>>
>>>>
>>>>
>>>>  -Grant
>>>>
>>>>
>>>>
>>
>>

Re: some question about the nips paper

Posted by Hao Zheng <vo...@gmail.com>.

let it be a tree of depth log(P), then the total communication cost will be
f(m,n)*(1/2+1/4+1/8+...)*P = f(m,n)*P, but not log(P)
do I see what you mean?

2008/3/27 Niranjan Balasubramanian <ni...@cs.umass.edu>:
> On Mar 26, 2008, at 12:06 AM, Hao Zheng wrote:
>
> > it's log(P). I just don't know how log(P) is obtained.
>  >
>
>  I am doing some guess work here but I think the log (P) factor arises
>  out of the combining the data that is passed back from the various
>  processors. Imagine tree of processors with depth log (P) and at each
>  level the communication amounts to some function of n and m, f(m,n),
>  then the total communication cost would be
>
>  f(m,n) log (P).
>
>  ~ Niranjan
>
>
>
>  > On 3/26/08, Grant Ingersoll <gs...@apache.org> wrote:
>  >>
>  >>  On Mar 25, 2008, at 4:11 PM, Isabel Drost wrote:
>  >>>
>  >>>
>  >>>> 2. Sect. 4.1, too.
>  >>>> "reduce phase can minimize communication by combining data as it's
>  >>>> passed back; this accounts for the logP factor", Could you help me
>  >>>> figure out how logP is calculated.
>  >>>
>  >>> Anyone else who can help out here?
>  >>
>  >>
>  >> Isn't this just log(P) where P is the number of cores?  The
>  >> version of
>  >>  the paper I have says log(P) not logP, so maybe there is a typo?
>  >>
>  >>   From earlier in 4.1:
>  >>  "We assume that the dimension of the inputs is n (i.e., x
>  >>  ∈R
>  >>  n
>  >>  ), that we have m
>  >>  training examples, and that there are P cores"
>  >>
>  >>
>  >>
>  >>  -Grant
>  >>
>  >>
>  >>
>
>

Re: some question about the nips paper

Posted by Niranjan Balasubramanian <ni...@cs.umass.edu>.

On Mar 26, 2008, at 12:06 AM, Hao Zheng wrote:
> it's log(P). I just don't know how log(P) is obtained.
>

I am doing some guess work here but I think the log (P) factor arises  
out of the combining the data that is passed back from the various  
processors. Imagine tree of processors with depth log (P) and at each  
level the communication amounts to some function of n and m, f(m,n),  
then the total communication cost would be

f(m,n) log (P).

~ Niranjan

> On 3/26/08, Grant Ingersoll <gs...@apache.org> wrote:
>>
>>  On Mar 25, 2008, at 4:11 PM, Isabel Drost wrote:
>>>
>>>
>>>> 2. Sect. 4.1, too.
>>>> "reduce phase can minimize communication by combining data as it's
>>>> passed back; this accounts for the logP factor", Could you help me
>>>> figure out how logP is calculated.
>>>
>>> Anyone else who can help out here?
>>
>>
>> Isn't this just log(P) where P is the number of cores?  The  
>> version of
>>  the paper I have says log(P) not logP, so maybe there is a typo?
>>
>>   From earlier in 4.1:
>>  "We assume that the dimension of the inputs is n (i.e., x
>>  ∈R
>>  n
>>  ), that we have m
>>  training examples, and that there are P cores"
>>
>>
>>
>>  -Grant
>>
>>
>>

Re: some question about the nips paper

Posted by Hao Zheng <vo...@gmail.com>.

it's log(P). I just don't know how log(P) is obtained.

On 3/26/08, Grant Ingersoll <gs...@apache.org> wrote:
>
>  On Mar 25, 2008, at 4:11 PM, Isabel Drost wrote:
>  >
>  >
>  >> 2. Sect. 4.1, too.
>  >> "reduce phase can minimize communication by combining data as it's
>  >> passed back; this accounts for the logP factor", Could you help me
>  >> figure out how logP is calculated.
>  >
>  > Anyone else who can help out here?
>
>
> Isn't this just log(P) where P is the number of cores?  The version of
>  the paper I have says log(P) not logP, so maybe there is a typo?
>
>   From earlier in 4.1:
>  "We assume that the dimension of the inputs is n (i.e., x
>  ∈R
>  n
>  ), that we have m
>  training examples, and that there are P cores"
>
>
>
>  -Grant
>
>
>

Re: some question about the nips paper

Posted by Grant Ingersoll <gs...@apache.org>.

On Mar 25, 2008, at 4:11 PM, Isabel Drost wrote:
>
>
>> 2. Sect. 4.1, too.
>> "reduce phase can minimize communication by combining data as it's
>> passed back; this accounts for the logP factor", Could you help me
>> figure out how logP is calculated.
>
> Anyone else who can help out here?

Isn't this just log(P) where P is the number of cores?  The version of  
the paper I have says log(P) not logP, so maybe there is a typo?

 From earlier in 4.1:
"We assume that the dimension of the inputs is n (i.e., x
∈R
n
), that we have m
training examples, and that there are P cores"


-Grant

Re: some question about the nips paper

Posted by Isabel Drost <ap...@isabel-drost.de>.

On Wednesday 26 March 2008, Hao Zheng wrote:
> for the 4th question, maybe we can still gain some speedups on
> multi-machine clusters. But I suspect that we should also explicitly
> consider the communication cost, which is non-trivial in such setting.
> What do you think?

I agree with you. I could imagine that it is not as easy as for a multi-core 
machine to get a correct result. There should be variables like amount of 
information sent over the network, latency of the network, bandwidth and the 
like... 

Isabel


-- 
Any circuit design must contain at least one part which is obsolete, two parts 
which are unobtainable, and three parts which are still under development.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>

Re: some question about the nips paper

Posted by Hao Zheng <vo...@gmail.com>.

Isabel, thanks for your answer.
for the 4th question, maybe we can still gain some speedups on
multi-machine clusters. But I suspect that we should also explicitly
consider the communication cost, which is non-trivial in such setting.
What do you think?

On 3/26/08, Isabel Drost <ap...@isabel-drost.de> wrote:
> On Tuesday 25 March 2008, Hao Zheng wrote:
>  > 1. Sect. 4.1 Algorithm Time Complexity Analysis.
>  > the paper assumes m >> n, that is, the training instances are much
>  > larger than the features. its datasets do have very few features. but
>  > this may not be true for many tasks, e.g. text classification, where
>  > feature dimensions will reach 10^4-10^5. then will the analysis still
>  > hold?
>
>
> What I could directly read from the paper in the very same section: The
>  analysis will not hold in this case for those algorithms that require matrix
>  inversions or eigen decompositions as long as these operations are not
>  executed in parallel. The authors did not implement parallel versions for
>  these operations - the reason they state is the fact that in their datasets m
>  >> n.
>
>  The authors state themselves that there is extensive research on parallelising
>  eigen decomposition and matrix inversion as well - so if we assume that we do
>  have a matrix package that can do these operations in a distributed way, IMHO
>  the analysis in the paper should still hold even for algorithms that require
>  these steps.
>
>
>
>  > 2. Sect. 4.1, too.
>  > "reduce phase can minimize communication by combining data as it's
>  > passed back; this accounts for the logP factor", Could you help me
>  > figure out how logP is calculated.
>
>
> Anyone else who can help out here?
>
>
>
>  > 3. Sect 5.4 Results and Discussion
>  > "SVM gets about 13.6% speed up on average over 16 cores", it's 13.6%
>  > or 13.6? From figure 2, it seems should be 13.6?
>
>
> The axis on the graphs do not have clear titles, but I would agree that it
>  should be 13.6 as well.
>
>
>
>  > 4. Sect 5.4, too.
>  > "Finally, the above are runs on multiprocessor machines." No matter
>  > multiprocess or multicore, it runs on a single machine which have a
>  > share memory.
>
>
> The main motivation for the paper was the rise of multi core machines that ask
>  for parallel algorithms even though one might not have a cluster available.
>
>
>
>  > But actually, M/R is for multi-machine, which will involve much more cost on
>  > inter-machine communication. So the results of the paper may be
>  > questionable?
>
>
> I think you should not expect to get the exact same speedups on multi-machine
>  clusters. Still I think one can expect faster computation for large datasets
>  even in this setting. What do others think?
>
>
>
>  Isabel
>
>
>
>  --
>  There is no TRUTH.  There is no REALITY.  There is no CONSISTENCY. There are
>  no ABSOLUTE STATEMENTS.   I'm very probably wrong.
>   |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
>   /,`.-'`'    -.  ;-;;,_
>   |,4-  ) )-,_..;\ (  `'-'
>  '---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>
>
>

Re: some question about the nips paper

Posted by Isabel Drost <ap...@isabel-drost.de>.

On Tuesday 25 March 2008, Hao Zheng wrote:
> 1. Sect. 4.1 Algorithm Time Complexity Analysis.
> the paper assumes m >> n, that is, the training instances are much
> larger than the features. its datasets do have very few features. but
> this may not be true for many tasks, e.g. text classification, where
> feature dimensions will reach 10^4-10^5. then will the analysis still
> hold?

What I could directly read from the paper in the very same section: The 
analysis will not hold in this case for those algorithms that require matrix 
inversions or eigen decompositions as long as these operations are not 
executed in parallel. The authors did not implement parallel versions for 
these operations - the reason they state is the fact that in their datasets m 
>> n.

The authors state themselves that there is extensive research on parallelising 
eigen decomposition and matrix inversion as well - so if we assume that we do 
have a matrix package that can do these operations in a distributed way, IMHO 
the analysis in the paper should still hold even for algorithms that require 
these steps.

> 2. Sect. 4.1, too.
> "reduce phase can minimize communication by combining data as it's
> passed back; this accounts for the logP factor", Could you help me
> figure out how logP is calculated.

Anyone else who can help out here?

> 3. Sect 5.4 Results and Discussion
> "SVM gets about 13.6% speed up on average over 16 cores", it's 13.6%
> or 13.6? From figure 2, it seems should be 13.6?

The axis on the graphs do not have clear titles, but I would agree that it 
should be 13.6 as well.

> 4. Sect 5.4, too.
> "Finally, the above are runs on multiprocessor machines." No matter
> multiprocess or multicore, it runs on a single machine which have a
> share memory.

The main motivation for the paper was the rise of multi core machines that ask 
for parallel algorithms even though one might not have a cluster available.

> But actually, M/R is for multi-machine, which will involve much more cost on
> inter-machine communication. So the results of the paper may be
> questionable? 

I think you should not expect to get the exact same speedups on multi-machine 
clusters. Still I think one can expect faster computation for large datasets 
even in this setting. What do others think?

Isabel

-- 
There is no TRUTH.  There is no REALITY.  There is no CONSISTENCY. There are 
no ABSOLUTE STATEMENTS.   I'm very probably wrong.
  |\      _,,,---,,_       Web:   <http://www.isabel-drost.de>
  /,`.-'`'    -.  ;-;;,_
 |,4-  ) )-,_..;\ (  `'-'
'---''(_/--'  `-'\_) (fL)  IM:  <xm...@spaceboyz.net>