You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Mahmood N <nt...@yahoo.com.INVALID> on 2016/02/03 19:35:10 UTC

Code execution path of mahout

Hi,
This is a question about Mahout 0.6! which is pretty old and I know that. Consider this command (which I don't know if it is valid in the newer versions or not)

./bin/mahout testclassifier -m $CLASSIFICATION_MODEL -d $CLASSIFICATION_INPUT --method mapreduce

I want to know which parts of the code are being executed with that command. I mean the execution path and functions.

Although the question is for an old version, but if you can shed a light on that (even for new versions), I appreciate that. 

 
Regards,
Mahmood

Re: Code execution path of mahout

Posted by Andrew Musselman <an...@gmail.com>.
I'd say work your way through that class and follow along with what it
does; I don't know of any documents like that beyond the code and what's on
the Mahout web site at http://mahout.apache.org.

On Wednesday, February 3, 2016, Mahmood Naderan <nt...@yahoo.com>
wrote:

> Really thanks for that. I am getting closer to what I was searching for...
> Is there any high level document about the procedure of the classifier
> (using map reduce) after the training phase. For example:
> 1- Reading chunks
> 2- Sorting each chunk
> 3-...
>
> I didn't find such an example on the web. Maybe I used wrong key words.
> The only thing I though is related to my work is
>
> https://github.com/fredang/mahout-naive-bayes-example/blob/master/src/main/java/com/chimpler/example/bayes/Classifier.java
>
> Do you confirm that? Do you think that is what I am looking for?
>
> Regards,
> Mahmood
>
>
> On Wednesday, February 3, 2016 10:59 PM, Andrew Musselman <
> andrew.musselman@gmail.com <javascript:;>> wrote:
>
>
>
> Here are a bunch
> https://github.com/apache/mahout/tree/master/math/src/main/java/org/apache/mahout/math
>
> Large matrices are typical, often on the order of hundreds of thousands to
> millions of rows and hundreds of columns.
>

Re: Code execution path of mahout

Posted by Mahmood Naderan <nt...@yahoo.com.INVALID>.
Really thanks for that. I am getting closer to what I was searching for...
Is there any high level document about the procedure of the classifier (using map reduce) after the training phase. For example:
1- Reading chunks
2- Sorting each chunk
3-...

I didn't find such an example on the web. Maybe I used wrong key words. The only thing I though is related to my work is
https://github.com/fredang/mahout-naive-bayes-example/blob/master/src/main/java/com/chimpler/example/bayes/Classifier.java

Do you confirm that? Do you think that is what I am looking for?
 
Regards,
Mahmood


On Wednesday, February 3, 2016 10:59 PM, Andrew Musselman <an...@gmail.com> wrote:



Here are a bunch https://github.com/apache/mahout/tree/master/math/src/main/java/org/apache/mahout/math

Large matrices are typical, often on the order of hundreds of thousands to millions of rows and hundreds of columns.

Re: Code execution path of mahout

Posted by Andrew Musselman <an...@gmail.com>.
Here are a bunch
https://github.com/apache/mahout/tree/master/math/src/main/java/org/apache/mahout/math

Large matrices are typical, often on the order of hundreds of thousands to
millions of rows and hundreds of columns.

On Wed, Feb 3, 2016 at 11:21 AM, Mahmood N <nt...@yahoo.com> wrote:

> >The new code still uses sparse and dense vectors and matrices, with local
> and distributed >iterators over rows and blocking into chunks of matrices
> as appropriate.
>
> That is a good thing to know...
> Regardless of the comparison, do you know where the most important data
> structures are defined? I mean where does it create the chunks (or read the
> chunks)? What are the sizes of the matrices? are they typically small
> (10x10) or large (1000x1000)?
>
>
> Regards,
> Mahmood
>
>
> On Wednesday, February 3, 2016 10:46 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>
>
> The new code still uses sparse and dense vectors and matrices, with local
> and distributed iterators over rows and blocking into chunks of matrices as
> appropriate.
>
> You would be better off checking out the newest version from source (
> https://github.com/apache/mahout) and taking a look since I won't be able
> to provide any useful comparison for your specific needs.
>

Re: Code execution path of mahout

Posted by Mahmood N <nt...@yahoo.com.INVALID>.
>The new code still uses sparse and dense vectors and matrices, with local and distributed >iterators over rows and blocking into chunks of matrices as appropriate.

That is a good thing to know...
Regardless of the comparison, do you know where the most important data structures are defined? I mean where does it create the chunks (or read the chunks)? What are the sizes of the matrices? are they typically small (10x10) or large (1000x1000)?

 
Regards,
Mahmood


On Wednesday, February 3, 2016 10:46 PM, Andrew Musselman <an...@gmail.com> wrote:



The new code still uses sparse and dense vectors and matrices, with local and distributed iterators over rows and blocking into chunks of matrices as appropriate.

You would be better off checking out the newest version from source (https://github.com/apache/mahout) and taking a look since I won't be able to provide any useful comparison for your specific needs.

Re: Code execution path of mahout

Posted by Andrew Musselman <an...@gmail.com>.
The new code still uses sparse and dense vectors and matrices, with local
and distributed iterators over rows and blocking into chunks of matrices as
appropriate.

You would be better off checking out the newest version from source (
https://github.com/apache/mahout) and taking a look since I won't be able
to provide any useful comparison for your specific needs.

On Wed, Feb 3, 2016 at 10:57 AM, Mahmood N <nt...@yahoo.com> wrote:

> Dear Andrew,
>
> Thanks for your reply. In fact, I need this information as part of my
> study on some data analytics workloads. The benchmark had been setup about
> 3 years ago by someone! What I really want to know is that how the software
> model (here the code execution path) differs from regular desktop and
> engineering workloads. I mean, do the mahout code extensively use arrays,
> matrices, lists, vectors,...? Do the mahout rely on nested loops? How about
> branch distribution in the code.
>
> So, you may answer the questions for the new version, then I will try to
> map that to the old version by comparing the functions.
>
> Regards,
> Mahmood
>
>
> On Wednesday, February 3, 2016 10:14 PM, Andrew Musselman <
> andrew.musselman@gmail.com> wrote:
>
>
>
> Hi Mahmood, would be possible to trace the path out in an IDE like
> IntelliJ but there's no automated method to print that out, if that's what
> you're asking.
>
> Definitely recommend upgrading as that's five major releases old if at all
> possible.
>
> Best
> Andrew
>

Re: Code execution path of mahout

Posted by Mahmood N <nt...@yahoo.com.INVALID>.
Dear Andrew, 

Thanks for your reply. In fact, I need this information as part of my study on some data analytics workloads. The benchmark had been setup about 3 years ago by someone! What I really want to know is that how the software model (here the code execution path) differs from regular desktop and engineering workloads. I mean, do the mahout code extensively use arrays, matrices, lists, vectors,...? Do the mahout rely on nested loops? How about branch distribution in the code.

So, you may answer the questions for the new version, then I will try to map that to the old version by comparing the functions.
 
Regards,
Mahmood


On Wednesday, February 3, 2016 10:14 PM, Andrew Musselman <an...@gmail.com> wrote:



Hi Mahmood, would be possible to trace the path out in an IDE like IntelliJ but there's no automated method to print that out, if that's what you're asking.

Definitely recommend upgrading as that's five major releases old if at all possible.

Best
Andrew

Re: Code execution path of mahout

Posted by Andrew Musselman <an...@gmail.com>.
Hi Mahmood, would be possible to trace the path out in an IDE like IntelliJ
but there's no automated method to print that out, if that's what you're
asking.

Definitely recommend upgrading as that's five major releases old if at all
possible.

Best
Andrew

On Wed, Feb 3, 2016 at 10:35 AM, Mahmood N <nt...@yahoo.com.invalid>
wrote:

> Hi,
> This is a question about Mahout 0.6! which is pretty old and I know that.
> Consider this command (which I don't know if it is valid in the newer
> versions or not)
>
> ./bin/mahout testclassifier -m $CLASSIFICATION_MODEL -d
> $CLASSIFICATION_INPUT --method mapreduce
>
> I want to know which parts of the code are being executed with that
> command. I mean the execution path and functions.
>
> Although the question is for an old version, but if you can shed a light
> on that (even for new versions), I appreciate that.
>
>
> Regards,
> Mahmood
>