You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by Satish Dandu <Sa...@melbourneitdbs.com> on 2008/08/26 23:44:19 UTC

Taste Vs Weka

Hi, 

   Recently i started using Taste. It's easy to set up and it really
looks good in terms of picking recommendation (demo using Group lens
dataset for Netflix data).  I also went through weka, now my question is
there any difference between WEKA and Taste (as both are open source
machine learning softwares). What advantages can we get by using Taste
(in addition to hadoop integration)

 

Thanks

Re: Taste Vs Weka

Posted by Sean Owen <sr...@gmail.com>.

I think they're tackling different problem -- Taste is only about
collaborative filtering while weka is more about data mining,
classification, etc. I don't think they overlap to any significant
degree. But Weka does overlap with other parts of Mahout. I myself am
probably not so qualified to compare the two, but maybe someone else
can.

On Tue, Aug 26, 2008 at 10:44 PM, Satish Dandu
<Sa...@melbourneitdbs.com> wrote:
> Hi,
>
>   Recently i started using Taste. It's easy to set up and it really
> looks good in terms of picking recommendation (demo using Group lens
> dataset for Netflix data).  I also went through weka, now my question is
> there any difference between WEKA and Taste (as both are open source
> machine learning softwares). What advantages can we get by using Taste
> (in addition to hadoop integration)
>
>
>
> Thanks
>
>

Re: Taste Vs Weka

Posted by Richard Tomsett <in...@gmail.com>.

Grant Ingersoll wrote:
> From a Mahout view, we are very much aiming at addressing the scaling 
> issue.  As for the GUI, I think that will always be a "contrib" for 
> Mahout, if one ever exists.  My personal goal for Mahout is to keep it 
> lean and easily usable in a wide variety of applications.  Just as 
> Lucene has made search a commodity in many ways, I think Mahout could 
> enable ML to be a commodity in 5 years.
>
> Also, a glaring difference between the two is Weka is GPL.  I'll leave 
> it to you to read all the discussions on ASL vs. GPL and do not want 
> to start that discussion here, as there is no point.
>
> Last, I imagine we will all coexist nicely.  Weka will be useful for 
> many tasks, and Mahout will be useful for many tasks and there will 
> certainly be overlap.
That makes a lot of sense :-)

Re: Taste Vs Weka

Posted by Cosmin Lehene <cl...@adobe.com>.

Sorry for that. I replied to a wrong recipient somehow.

Cosmin


On 8/28/08 5:15 PM, "Cosmin Lehene" <cl...@adobe.com> wrote:

Cred ca cel mai simplu e sa spui ca il folosim cu foloseste lumea si MySQL. Adica avem niste servere in care stocam niste date


On 8/28/08 5:02 PM, "Xiance SI  (司宪策)" <ad...@gmail.com> wrote:

+1
I think Mahout should focus on scalability and performance instead of GUI,
that's what Hadoop is good at.

Xiance

On Wed, Aug 27, 2008 at 10:33 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Aug 27, 2008, at 8:33 AM, Richard Tomsett wrote:
>
>  There's quite a good description of WEKA and its capabilities on the
>> course page for a module I took this year:
>> http://www.inf.ed.ac.uk/teaching/courses/dme/html/software2.html
>>
>> It's more a general suite of data-mining tools rather than a tool to
>> address a specific task like Taste (plus it's obviously not implemented for
>> parallel processing which could be problematic for scaling up). From the
>> link above:
>>
>>  * *Advantages*: The obvious advantage of a package like Weka is that
>>    *a whole range of data preparation, feature selection and data
>>    mining algorithms are integrated*. This means that only one data
>>    format is needed, and trying out and comparing different
>>    approaches becomes really easy. The package also comes with *a
>>    GUI*, which should make it easier to use.
>>
>
> Yeah, it would be good for Mahout to adopt an approach for either
> translating from ARFF to our format, or just use ARFF or whatever else Weka
> does, but I don't want it to preclude us from innovating where we need to
> innovate.
>
>
>
>>
>>  * *Disadvantages*: Probably the most important disadvantage of data
>>    mining suites like this is that *they do not implement the newest
>>    techniques*. For example the MLP implemented has a very basic
>>    training algorithm (backprop with momentum), and the SVM only uses
>>    polynomial kernels, and does not support numeric estimation. ...
>>    *A third possible problem is scaling*. For difficult tasks on
>>    large datasets, the running time can become quite long, and java
>>    sometimes gives an OutOfMemory error. This problem can be reduced
>>    by using the '-mx/x/' option when calling java, where /x/ is
>>    memory size (eg '50m'). For large datasets it will always be
>>    necessary to reduce the size to be able to work within reasonable
>>    time limits. A fourth problem is that *the GUI does not implement
>>    all the possible options*. Things that could be very useful, like
>>    scoring of a test set, are not provided in the GUI, but can be
>>    called from the command line interface. So sometimes it will be
>>    necessary to switch between GUI and command line. Finally, *the
>>    data preparation and visualisation techniques offered might not be
>>    enough*. Most of them are very useful, but I think in most data
>>    mining tasks you will need more to get to know the data well and
>>    to get it in the right format.
>>
>>
> From a Mahout view, we are very much aiming at addressing the scaling
> issue.  As for the GUI, I think that will always be a "contrib" for Mahout,
> if one ever exists.  My personal goal for Mahout is to keep it lean and
> easily usable in a wide variety of applications.  Just as Lucene has made
> search a commodity in many ways, I think Mahout could enable ML to be a
> commodity in 5 years.
>
> Also, a glaring difference between the two is Weka is GPL.  I'll leave it
> to you to read all the discussions on ASL vs. GPL and do not want to start
> that discussion here, as there is no point.
>
> Last, I imagine we will all coexist nicely.  Weka will be useful for many
> tasks, and Mahout will be useful for many tasks and there will certainly be
> overlap.
>
>

Re: Taste Vs Weka

Posted by Cosmin Lehene <cl...@adobe.com>.

Cred ca cel mai simplu e sa spui ca il folosim cu foloseste lumea si MySQL. Adica avem niste servere in care stocam niste date


On 8/28/08 5:02 PM, "Xiance SI  (司宪策)" <ad...@gmail.com> wrote:

+1
I think Mahout should focus on scalability and performance instead of GUI,
that's what Hadoop is good at.

Xiance

On Wed, Aug 27, 2008 at 10:33 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Aug 27, 2008, at 8:33 AM, Richard Tomsett wrote:
>
>  There's quite a good description of WEKA and its capabilities on the
>> course page for a module I took this year:
>> http://www.inf.ed.ac.uk/teaching/courses/dme/html/software2.html
>>
>> It's more a general suite of data-mining tools rather than a tool to
>> address a specific task like Taste (plus it's obviously not implemented for
>> parallel processing which could be problematic for scaling up). From the
>> link above:
>>
>>  * *Advantages*: The obvious advantage of a package like Weka is that
>>    *a whole range of data preparation, feature selection and data
>>    mining algorithms are integrated*. This means that only one data
>>    format is needed, and trying out and comparing different
>>    approaches becomes really easy. The package also comes with *a
>>    GUI*, which should make it easier to use.
>>
>
> Yeah, it would be good for Mahout to adopt an approach for either
> translating from ARFF to our format, or just use ARFF or whatever else Weka
> does, but I don't want it to preclude us from innovating where we need to
> innovate.
>
>
>
>>
>>  * *Disadvantages*: Probably the most important disadvantage of data
>>    mining suites like this is that *they do not implement the newest
>>    techniques*. For example the MLP implemented has a very basic
>>    training algorithm (backprop with momentum), and the SVM only uses
>>    polynomial kernels, and does not support numeric estimation. ...
>>    *A third possible problem is scaling*. For difficult tasks on
>>    large datasets, the running time can become quite long, and java
>>    sometimes gives an OutOfMemory error. This problem can be reduced
>>    by using the '-mx/x/' option when calling java, where /x/ is
>>    memory size (eg '50m'). For large datasets it will always be
>>    necessary to reduce the size to be able to work within reasonable
>>    time limits. A fourth problem is that *the GUI does not implement
>>    all the possible options*. Things that could be very useful, like
>>    scoring of a test set, are not provided in the GUI, but can be
>>    called from the command line interface. So sometimes it will be
>>    necessary to switch between GUI and command line. Finally, *the
>>    data preparation and visualisation techniques offered might not be
>>    enough*. Most of them are very useful, but I think in most data
>>    mining tasks you will need more to get to know the data well and
>>    to get it in the right format.
>>
>>
> From a Mahout view, we are very much aiming at addressing the scaling
> issue.  As for the GUI, I think that will always be a "contrib" for Mahout,
> if one ever exists.  My personal goal for Mahout is to keep it lean and
> easily usable in a wide variety of applications.  Just as Lucene has made
> search a commodity in many ways, I think Mahout could enable ML to be a
> commodity in 5 years.
>
> Also, a glaring difference between the two is Weka is GPL.  I'll leave it
> to you to read all the discussions on ASL vs. GPL and do not want to start
> that discussion here, as there is no point.
>
> Last, I imagine we will all coexist nicely.  Weka will be useful for many
> tasks, and Mahout will be useful for many tasks and there will certainly be
> overlap.
>
>

Re: Taste Vs Weka

Posted by "Xiance SI(司宪策)" <ad...@gmail.com>.

+1
I think Mahout should focus on scalability and performance instead of GUI,
that's what Hadoop is good at.

Xiance

On Wed, Aug 27, 2008 at 10:33 PM, Grant Ingersoll <gs...@apache.org>wrote:

>
> On Aug 27, 2008, at 8:33 AM, Richard Tomsett wrote:
>
>  There's quite a good description of WEKA and its capabilities on the
>> course page for a module I took this year:
>> http://www.inf.ed.ac.uk/teaching/courses/dme/html/software2.html
>>
>> It's more a general suite of data-mining tools rather than a tool to
>> address a specific task like Taste (plus it's obviously not implemented for
>> parallel processing which could be problematic for scaling up). From the
>> link above:
>>
>>  * *Advantages*: The obvious advantage of a package like Weka is that
>>    *a whole range of data preparation, feature selection and data
>>    mining algorithms are integrated*. This means that only one data
>>    format is needed, and trying out and comparing different
>>    approaches becomes really easy. The package also comes with *a
>>    GUI*, which should make it easier to use.
>>
>
> Yeah, it would be good for Mahout to adopt an approach for either
> translating from ARFF to our format, or just use ARFF or whatever else Weka
> does, but I don't want it to preclude us from innovating where we need to
> innovate.
>
>
>
>>
>>  * *Disadvantages*: Probably the most important disadvantage of data
>>    mining suites like this is that *they do not implement the newest
>>    techniques*. For example the MLP implemented has a very basic
>>    training algorithm (backprop with momentum), and the SVM only uses
>>    polynomial kernels, and does not support numeric estimation. ...
>>    *A third possible problem is scaling*. For difficult tasks on
>>    large datasets, the running time can become quite long, and java
>>    sometimes gives an OutOfMemory error. This problem can be reduced
>>    by using the '-mx/x/' option when calling java, where /x/ is
>>    memory size (eg '50m'). For large datasets it will always be
>>    necessary to reduce the size to be able to work within reasonable
>>    time limits. A fourth problem is that *the GUI does not implement
>>    all the possible options*. Things that could be very useful, like
>>    scoring of a test set, are not provided in the GUI, but can be
>>    called from the command line interface. So sometimes it will be
>>    necessary to switch between GUI and command line. Finally, *the
>>    data preparation and visualisation techniques offered might not be
>>    enough*. Most of them are very useful, but I think in most data
>>    mining tasks you will need more to get to know the data well and
>>    to get it in the right format.
>>
>>
> From a Mahout view, we are very much aiming at addressing the scaling
> issue.  As for the GUI, I think that will always be a "contrib" for Mahout,
> if one ever exists.  My personal goal for Mahout is to keep it lean and
> easily usable in a wide variety of applications.  Just as Lucene has made
> search a commodity in many ways, I think Mahout could enable ML to be a
> commodity in 5 years.
>
> Also, a glaring difference between the two is Weka is GPL.  I'll leave it
> to you to read all the discussions on ASL vs. GPL and do not want to start
> that discussion here, as there is no point.
>
> Last, I imagine we will all coexist nicely.  Weka will be useful for many
> tasks, and Mahout will be useful for many tasks and there will certainly be
> overlap.
>
>

Re: Taste Vs Weka

Posted by Grant Ingersoll <gs...@apache.org>.

On Aug 27, 2008, at 8:33 AM, Richard Tomsett wrote:

> There's quite a good description of WEKA and its capabilities on the  
> course page for a module I took this year: http://www.inf.ed.ac.uk/teaching/courses/dme/html/software2.html
>
> It's more a general suite of data-mining tools rather than a tool to  
> address a specific task like Taste (plus it's obviously not  
> implemented for parallel processing which could be problematic for  
> scaling up). From the link above:
>
>   * *Advantages*: The obvious advantage of a package like Weka is that
>     *a whole range of data preparation, feature selection and data
>     mining algorithms are integrated*. This means that only one data
>     format is needed, and trying out and comparing different
>     approaches becomes really easy. The package also comes with *a
>     GUI*, which should make it easier to use.

Yeah, it would be good for Mahout to adopt an approach for either  
translating from ARFF to our format, or just use ARFF or whatever else  
Weka does, but I don't want it to preclude us from innovating where we  
need to innovate.

>
>
>   * *Disadvantages*: Probably the most important disadvantage of data
>     mining suites like this is that *they do not implement the newest
>     techniques*. For example the MLP implemented has a very basic
>     training algorithm (backprop with momentum), and the SVM only uses
>     polynomial kernels, and does not support numeric estimation. ...
>     *A third possible problem is scaling*. For difficult tasks on
>     large datasets, the running time can become quite long, and java
>     sometimes gives an OutOfMemory error. This problem can be reduced
>     by using the '-mx/x/' option when calling java, where /x/ is
>     memory size (eg '50m'). For large datasets it will always be
>     necessary to reduce the size to be able to work within reasonable
>     time limits. A fourth problem is that *the GUI does not implement
>     all the possible options*. Things that could be very useful, like
>     scoring of a test set, are not provided in the GUI, but can be
>     called from the command line interface. So sometimes it will be
>     necessary to switch between GUI and command line. Finally, *the
>     data preparation and visualisation techniques offered might not be
>     enough*. Most of them are very useful, but I think in most data
>     mining tasks you will need more to get to know the data well and
>     to get it in the right format.
>

 From a Mahout view, we are very much aiming at addressing the scaling  
issue.  As for the GUI, I think that will always be a "contrib" for  
Mahout, if one ever exists.  My personal goal for Mahout is to keep it  
lean and easily usable in a wide variety of applications.  Just as  
Lucene has made search a commodity in many ways, I think Mahout could  
enable ML to be a commodity in 5 years.

Also, a glaring difference between the two is Weka is GPL.  I'll leave  
it to you to read all the discussions on ASL vs. GPL and do not want  
to start that discussion here, as there is no point.

Last, I imagine we will all coexist nicely.  Weka will be useful for  
many tasks, and Mahout will be useful for many tasks and there will  
certainly be overlap.

Re: Taste Vs Weka

Posted by Richard Tomsett <in...@gmail.com>.

There's quite a good description of WEKA and its capabilities on the 
course page for a module I took this year: 
http://www.inf.ed.ac.uk/teaching/courses/dme/html/software2.html

It's more a general suite of data-mining tools rather than a tool to 
address a specific task like Taste (plus it's obviously not implemented 
for parallel processing which could be problematic for scaling up). From 
the link above:

    * *Advantages*: The obvious advantage of a package like Weka is that
      *a whole range of data preparation, feature selection and data
      mining algorithms are integrated*. This means that only one data
      format is needed, and trying out and comparing different
      approaches becomes really easy. The package also comes with *a
      GUI*, which should make it easier to use.

    * *Disadvantages*: Probably the most important disadvantage of data
      mining suites like this is that *they do not implement the newest
      techniques*. For example the MLP implemented has a very basic
      training algorithm (backprop with momentum), and the SVM only uses
      polynomial kernels, and does not support numeric estimation. ...
      *A third possible problem is scaling*. For difficult tasks on
      large datasets, the running time can become quite long, and java
      sometimes gives an OutOfMemory error. This problem can be reduced
      by using the '-mx/x/' option when calling java, where /x/ is
      memory size (eg '50m'). For large datasets it will always be
      necessary to reduce the size to be able to work within reasonable
      time limits. A fourth problem is that *the GUI does not implement
      all the possible options*. Things that could be very useful, like
      scoring of a test set, are not provided in the GUI, but can be
      called from the command line interface. So sometimes it will be
      necessary to switch between GUI and command line. Finally, *the
      data preparation and visualisation techniques offered might not be
      enough*. Most of them are very useful, but I think in most data
      mining tasks you will need more to get to know the data well and
      to get it in the right format.


Hope that's helpful :-)


Satish Dandu wrote:
> Hi, 
>
>    Recently i started using Taste. It's easy to set up and it really
> looks good in terms of picking recommendation (demo using Group lens
> dataset for Netflix data).  I also went through weka, now my question is
> there any difference between WEKA and Taste (as both are open source
> machine learning softwares). What advantages can we get by using Taste
> (in addition to hadoop integration)
>
>  
>
> Thanks
>
>
>