You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Viva Friend <vi...@gmail.com> on 2010/07/11 01:15:50 UTC

Any memory limitation in Mahout

Hi All,

I have been using SVM LibLINEAR
http://wiki.pentaho.com/display/DATAMINING/LibLINEAR in WEKA. After
maxing out the physical memory size of machine, I am still facing
OutOfMemoryError

Will Mahout implementations eliminate all memory limitation issues
because of its distributed processing nature?

For example, will the following jira issue for implementing LibLinear
in Mahhout solve the OutOfMemoryError in WEKA LibLinear
implementation?
https://issues.apache.org/jira/browse/MAHOUT-334

Thanks in advance.

vivafriend

Re: Any memory limitation in Mahout

Posted by Ted Dunning <te...@gmail.com>.
For the SVM implementation, I am not sure.

For the SGD implementation, the number of possible features will not have a
direct impact on memory usage since a hashed feature vector is used.  The
number of class labels has a linear impact on the amount of memory required,
but the dependence isn't horrible.  For complex problems such as text,
several other variables and high degrees of cross-terms you will want to
have 50,000 - 1,000,000 internal features which means that the scaling will
be about 8MB per class label.  I am not sure how performance would be at
more than a hundred class labels in any case so the memory requirements are
significant, but not absolutely massive.

On Sat, Jul 10, 2010 at 4:38 PM, Viva Friend <vi...@gmail.com> wrote:

>
> Also, will the number of features and class labels affect the memory
> usage for the Liblinear Mahout implementation?
>
>

Re: Any memory limitation in Mahout

Posted by Viva Friend <vi...@gmail.com>.
Thanks Ted.

Glad to hear that the number training examples will not be limited the
physical memory size of one box.

Also, will the number of features and class labels affect the memory
usage for the Liblinear Mahout implementation?

vivafriend



On Sat, Jul 10, 2010 at 4:21 PM, Ted Dunning <te...@gmail.com> wrote:
> Both MAHOUT-334 and MAHOUT-228 will help you with your memory problems.
>  Both are on-line algorithms which means that they use a constant amount of
> memory regardless of how many training examples you have.
>
> On Sat, Jul 10, 2010 at 4:15 PM, Viva Friend <vi...@gmail.com> wrote:
>
>> Hi All,
>>
>> I have been using SVM LibLINEAR
>> http://wiki.pentaho.com/display/DATAMINING/LibLINEAR in WEKA. After
>> maxing out the physical memory size of machine, I am still facing
>> OutOfMemoryError
>>
>> Will Mahout implementations eliminate all memory limitation issues
>> because of its distributed processing nature?
>>
>> For example, will the following jira issue for implementing LibLinear
>> in Mahhout solve the OutOfMemoryError in WEKA LibLinear
>> implementation?
>> https://issues.apache.org/jira/browse/MAHOUT-334
>>
>> Thanks in advance.
>>
>> vivafriend
>>
>

Re: Any memory limitation in Mahout

Posted by Ted Dunning <te...@gmail.com>.
Both MAHOUT-334 and MAHOUT-228 will help you with your memory problems.
 Both are on-line algorithms which means that they use a constant amount of
memory regardless of how many training examples you have.

On Sat, Jul 10, 2010 at 4:15 PM, Viva Friend <vi...@gmail.com> wrote:

> Hi All,
>
> I have been using SVM LibLINEAR
> http://wiki.pentaho.com/display/DATAMINING/LibLINEAR in WEKA. After
> maxing out the physical memory size of machine, I am still facing
> OutOfMemoryError
>
> Will Mahout implementations eliminate all memory limitation issues
> because of its distributed processing nature?
>
> For example, will the following jira issue for implementing LibLinear
> in Mahhout solve the OutOfMemoryError in WEKA LibLinear
> implementation?
> https://issues.apache.org/jira/browse/MAHOUT-334
>
> Thanks in advance.
>
> vivafriend
>