You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "shunkai.fu" <sh...@roboo.com> on 2008/03/19 09:03:09 UTC

答复: 答复: [jira] Commented: (MAHOUT-18) Embrace interoperability with other softwares

I think it is open standard, but not all have the proposal right.

-----邮件原件-----
发件人: Thilo Goetz [mailto:twgoetz@gmx.de] 
发送时间: 2008年3月19日 15:35
收件人: mahout-dev@lucene.apache.org
主题: Re: 答复: [jira] Commented: (MAHOUT-18) Embrace interoperability with
other softwares

What are the licensing conditions for PMML?  I looked,
but couldn't find anything on the website.

Thanks,
Thilo

shunkai.fu wrote:
> You can find some known format, PMML (http://www.dmg.org/products.html) 
> 
> -----邮件原件-----
> 发件人: Ted Dunning (JIRA) [mailto:jira@apache.org] 
> 发送时间: 2008年3月19日 8:56
> 收件人: mahout-dev@lucene.apache.org
> 主题: [jira] Commented: (MAHOUT-18) Embrace interoperability with other
> softwares
> 
> 
>     [
>
https://issues.apache.org/jira/browse/MAHOUT-18?page=com.atlassian.jira.plug
>
in.system.issuetabpanels:comment-tabpanel&focusedCommentId=12580199#action_1
> 2580199 ] 
> 
> Ted Dunning commented on MAHOUT-18:
> -----------------------------------
> 
> 
> What are the possible formats?
> 
> Do any of the formats express parallel execution?
> 
> What are the criteria that we should use to decide which formats to
support?
> 
> 
> 
>> Embrace interoperability with other softwares
>> ---------------------------------------------
>>
>>                 Key: MAHOUT-18
>>                 URL: https://issues.apache.org/jira/browse/MAHOUT-18
>>             Project: Mahout
>>          Issue Type: New JIRA Project
>>            Reporter: Shunkai Fu
>>            Priority: Minor
>>
>> This is an open issue. It is related with all possible components
existing
> or to born in the future. 
>> ML or DM models normally have two phases: training and scoring (or
> predicting). If we agree "updating" is an independent one, we will have 3
> phases. 
>> There are many softwares about ML/DM outside. We want the users of Mahout
> be able to import models got built from other software here, update them
> and/or use them for scoring. To achieve this goal, we need to recognize
the
> commonly used formats. 
>> Besides, users may choose Mahout because Mahout is speedy in learning.
> After a model is ready, they may export the model trained, view it with
some
> visualization tool, or import it into other software or application for
> scoring (or predicting). In this case, exporting into widely recognized
> format is expected.
>> Finally, I want to say that the importing and exporting will not
influence
> the ongoing projects, so developers of other components need not worry
about
> this. 
> 


Re: ??: ??: [jira] Commented: (MAHOUT-18) Embrace interoperability with other softwares

Posted by Thilo Goetz <tw...@gmx.de>.
Jason Rennie wrote:
> Looks like the format already has formats for some popular models, including
> SVM, regression, NNs.
> 
> Unclear to me how anyone could prevent us from using the standard unless it
> were patented.  

Exactly.  Usually in a truly open standard, the companies and
individuals that contribute waive any patent rights on the
standard.  Given what you can get patents on these days, there
might well be some protected IP lurking there.

It's just strange that an organization with members like this
does not provide very clear and up-front statements about their
IP/licensing policy (or none that I could find, anyway).

I'm probably overreacting.  All I'm trying to say is: before
you spend a lot of time on this, find out what the deal is.

--Thilo

 > Copyright only protects works of art, which would include
> specific PMML files, but not the format.  One thing I noticed is that open
> source projects are allowed to take part in the PMML process for free...
> 
> My interpretation of PMML is that it represents a model.  As others have
> mentioned, prediction models (e.g. classification, regression; not
> clustering) basically have two parts: (1) learning, where the training data
> is used to train (optimize parameters for) the model, (2) prediction, where
> values are assigned to data points (documents/genes/etc.) based on the
> model.  In some cases (e.g. Naive Bayes, kNN), the "learning" is virtually
> non-existent and simply involves transforming the training data into a form
> that makes prediction easy/efficient.  In other cases (e.g. SVM, ordinal
> regression, NN, non-naive Bayesian Network), learning involves non-trivial
> optimization, often requiring much more memory & computation than that of
> prediction, and there is value in being able to "save" a model for use
> elsewhere.
> 
> The format is, of course, algorithm specific, so it's probably best to
> consider writing a PMML on an algorithm-by-algorithm basis...
> 
> Jason
> 


Re: 答复: 答复: [jira] Commented: (MAHOUT-18) Embrace interoperability with other softwares

Posted by Jason Rennie <jr...@gmail.com>.
Looks like the format already has formats for some popular models, including
SVM, regression, NNs.

Unclear to me how anyone could prevent us from using the standard unless it
were patented.  Copyright only protects works of art, which would include
specific PMML files, but not the format.  One thing I noticed is that open
source projects are allowed to take part in the PMML process for free...

My interpretation of PMML is that it represents a model.  As others have
mentioned, prediction models (e.g. classification, regression; not
clustering) basically have two parts: (1) learning, where the training data
is used to train (optimize parameters for) the model, (2) prediction, where
values are assigned to data points (documents/genes/etc.) based on the
model.  In some cases (e.g. Naive Bayes, kNN), the "learning" is virtually
non-existent and simply involves transforming the training data into a form
that makes prediction easy/efficient.  In other cases (e.g. SVM, ordinal
regression, NN, non-naive Bayesian Network), learning involves non-trivial
optimization, often requiring much more memory & computation than that of
prediction, and there is value in being able to "save" a model for use
elsewhere.

The format is, of course, algorithm specific, so it's probably best to
consider writing a PMML on an algorithm-by-algorithm basis...

Jason