You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Thilo Goetz <tw...@gmx.de> on 2008/03/20 08:13:37 UTC

Re: ??: ??: [jira] Commented: (MAHOUT-18) Embrace interoperability with other softwares

Jason Rennie wrote:
> Looks like the format already has formats for some popular models, including
> SVM, regression, NNs.
> 
> Unclear to me how anyone could prevent us from using the standard unless it
> were patented.  

Exactly.  Usually in a truly open standard, the companies and
individuals that contribute waive any patent rights on the
standard.  Given what you can get patents on these days, there
might well be some protected IP lurking there.

It's just strange that an organization with members like this
does not provide very clear and up-front statements about their
IP/licensing policy (or none that I could find, anyway).

I'm probably overreacting.  All I'm trying to say is: before
you spend a lot of time on this, find out what the deal is.

--Thilo

 > Copyright only protects works of art, which would include
> specific PMML files, but not the format.  One thing I noticed is that open
> source projects are allowed to take part in the PMML process for free...
> 
> My interpretation of PMML is that it represents a model.  As others have
> mentioned, prediction models (e.g. classification, regression; not
> clustering) basically have two parts: (1) learning, where the training data
> is used to train (optimize parameters for) the model, (2) prediction, where
> values are assigned to data points (documents/genes/etc.) based on the
> model.  In some cases (e.g. Naive Bayes, kNN), the "learning" is virtually
> non-existent and simply involves transforming the training data into a form
> that makes prediction easy/efficient.  In other cases (e.g. SVM, ordinal
> regression, NN, non-naive Bayesian Network), learning involves non-trivial
> optimization, often requiring much more memory & computation than that of
> prediction, and there is value in being able to "save" a model for use
> elsewhere.
> 
> The format is, of course, algorithm specific, so it's probably best to
> consider writing a PMML on an algorithm-by-algorithm basis...
> 
> Jason
>