You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Simon Vocella <vo...@gmail.com> on 2012/12/19 14:12:50 UTC

Fwd: mahout-pmml

Hi All,

as Grant suggested, I forward the email about mahout-pmml.
I already tried jpmml standalone and works fine for me, the next important
point is to understand or maybe create some example for each model
described before:

   - NeuralNetwork
   - RandomForest (implemented via Segmentation, which is a PMML version
   4.0 feature)
   - RegressionModel
   - TreeModel

with only Mahout and next step create a convertor to create object from
jpmml to Mahout. This is related only to import the object and for me the
export object is more similar to these.

Do you agree? Are you interested in this models? Or Mahout focus on another
one?

regards,
Simon

---------- Forwarded message ----------
From: Simon Vocella <vo...@gmail.com>
Date: Mon, Dec 17, 2012 at 1:50 AM
Subject: mahout-pmml
To: Grant Ingersoll <gs...@apache.org>
Cc: Marty Kube <ma...@beavercreekconsulting.com>


Hi Grant,

I start with this is the project https://github.com/voxsim/mahout-pmml (I
pushed only the skeleton for now) with mahout and jpmml integration (
http://code.google.com/p/jpmml/)

I read the wiki about weka convertor
https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html
And I read the integration with Lucene
http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/

In theory we need to do more similar to these parts, but different, we
don't transfrom vector but model, Do i understand correctly?

I'll request directly to you because you have in mind this idea and for now
jpmml support this models

   - NeuralNetwork
   - RandomForest (implemented via Segmentation, which is a PMML version
   4.0 feature)
   - RegressionModel
   - TreeModel

Are you interested in this models? Or Mahout focus on another one?

Simon
PS Marty before to start I need some answers sorry XD

Re: mahout-pmml

Posted by Marty Kube <ma...@beavercreekconsulting.com>.
Hi Ted,
That makes some sense.  I'll probably take a crack at it.
Marty


On 12/27/2012 12:14 AM, Ted Dunning wrote:
> Marty,
>
> That sounds like a reasonable idea.  IF integrated, this would need to be a
> separate module in any case so for now, it might be easiest for you to
> simply develop this module independently so that you don't have to wait for
> others to commit partial results.
>
>
>
> On Wed, Dec 26, 2012 at 6:52 PM, Marty Kube <
> martykube@beavercreekconsulting.com> wrote:
>
>> I took a look at JPMML...  At the bottom of it they have ran a JAXB
>> compiler on the PMML V4 schema to generate Java bindings.  I didn't see a
>> lot of value add in JPMML beyond that.
>>
>> I'd say just add the schema and bindings generation to Mahout.  The value
>> add here is model mapping from the JAXB generated model into the Mahout
>> models.
>>
>> On 12/20/2012 06:13 AM, Grant Ingersoll wrote:
>>
>>>   From looking at PMML (http://www.dmg.org/v4-1/**GeneralStructure.html<http://www.dmg.org/v4-1/GeneralStructure.html>),
>>> it seems that JPMML is not going to really get us there if it only supports
>>> the 4 models listed below.  I would think we could go through the
>>> structures supported in the link above and then map it to the Algorithms
>>> that are supported.  To start, perhaps it would make sense to focus on a
>>> few like: clustering, naive bayes and perhaps SGD will fit into the
>>> regression models.  Perhaps try to get K-Means and Naive Bayes to work
>>> first.
>>>
>>> FTR, I can only imagine how bloated these files are going to get since
>>> they use XML.  Thankfully, they won't be used to power the internals, just
>>> to support interoperability.
>>>
>>> -Grant
>>>
>>> On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:
>>>
>>>   Hi All,
>>>> as Grant suggested, I forward the email about mahout-pmml.
>>>> I already tried jpmml standalone and works fine for me, the next
>>>> important point is to understand or maybe create some example for each
>>>> model described before:
>>>> NeuralNetwork
>>>> RandomForest (implemented via Segmentation, which is a PMML version 4.0
>>>> feature)
>>>> RegressionModel
>>>> TreeModel
>>>> with only Mahout and next step create a convertor to create object from
>>>> jpmml to Mahout. This is related only to import the object and for me the
>>>> export object is more similar to these.
>>>>
>>>> Do you agree? Are you interested in this models? Or Mahout focus on
>>>> another one?
>>>>
>>>> regards,
>>>> Simon
>>>>
>>>> ---------- Forwarded message ----------
>>>> From: Simon Vocella <vo...@gmail.com>
>>>> Date: Mon, Dec 17, 2012 at 1:50 AM
>>>> Subject: mahout-pmml
>>>> To: Grant Ingersoll <gs...@apache.org>
>>>> Cc: Marty Kube <ma...@beavercreekconsulting.com>
>>>>
>>>> Hi Grant,
>>>>
>>>> I start with this is the project https://github.com/voxsim/**mahout-pmml<https://github.com/voxsim/mahout-pmml>(I pushed only the skeleton for now) with mahout and jpmml integration (
>>>> http://code.google.com/p/**jpmml/ <http://code.google.com/p/jpmml/>)
>>>>
>>>> I read the wiki about weka convertor https://cwiki.apache.org/**
>>>> MAHOUT/creating-vectors-from-**wekas-arff-format.html<https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html>
>>>> And I read the integration with Lucene http://searchhub.org/2010/03/**
>>>> 16/integrating-apache-mahout-**with-apache-lucene-and-solr-**
>>>> part-i-of-3/<http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/>
>>>>
>>>> In theory we need to do more similar to these parts, but different, we
>>>> don't transfrom vector but model, Do i understand correctly?
>>>>
>>>> I'll request directly to you because you have in mind this idea and for
>>>> now jpmml support this models
>>>> NeuralNetwork
>>>> RandomForest (implemented via Segmentation, which is a PMML version 4.0
>>>> feature)
>>>> RegressionModel
>>>> TreeModel
>>>> Are you interested in this models? Or Mahout focus on another one?
>>>>
>>>> Simon
>>>>
>>>> PS Marty before to start I need some answers sorry XD
>>>>
>>>>   ------------------------------**--------------
>>> Grant Ingersoll
>>> http://www.lucidworks.com
>>>
>>>
>>>
>>>
>>>
>>>


Re: mahout-pmml

Posted by Ted Dunning <te...@gmail.com>.
Marty,

That sounds like a reasonable idea.  IF integrated, this would need to be a
separate module in any case so for now, it might be easiest for you to
simply develop this module independently so that you don't have to wait for
others to commit partial results.



On Wed, Dec 26, 2012 at 6:52 PM, Marty Kube <
martykube@beavercreekconsulting.com> wrote:

> I took a look at JPMML...  At the bottom of it they have ran a JAXB
> compiler on the PMML V4 schema to generate Java bindings.  I didn't see a
> lot of value add in JPMML beyond that.
>
> I'd say just add the schema and bindings generation to Mahout.  The value
> add here is model mapping from the JAXB generated model into the Mahout
> models.
>
> On 12/20/2012 06:13 AM, Grant Ingersoll wrote:
>
>>  From looking at PMML (http://www.dmg.org/v4-1/**GeneralStructure.html<http://www.dmg.org/v4-1/GeneralStructure.html>),
>> it seems that JPMML is not going to really get us there if it only supports
>> the 4 models listed below.  I would think we could go through the
>> structures supported in the link above and then map it to the Algorithms
>> that are supported.  To start, perhaps it would make sense to focus on a
>> few like: clustering, naive bayes and perhaps SGD will fit into the
>> regression models.  Perhaps try to get K-Means and Naive Bayes to work
>> first.
>>
>> FTR, I can only imagine how bloated these files are going to get since
>> they use XML.  Thankfully, they won't be used to power the internals, just
>> to support interoperability.
>>
>> -Grant
>>
>> On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:
>>
>>  Hi All,
>>>
>>> as Grant suggested, I forward the email about mahout-pmml.
>>> I already tried jpmml standalone and works fine for me, the next
>>> important point is to understand or maybe create some example for each
>>> model described before:
>>> NeuralNetwork
>>> RandomForest (implemented via Segmentation, which is a PMML version 4.0
>>> feature)
>>> RegressionModel
>>> TreeModel
>>> with only Mahout and next step create a convertor to create object from
>>> jpmml to Mahout. This is related only to import the object and for me the
>>> export object is more similar to these.
>>>
>>> Do you agree? Are you interested in this models? Or Mahout focus on
>>> another one?
>>>
>>> regards,
>>> Simon
>>>
>>> ---------- Forwarded message ----------
>>> From: Simon Vocella <vo...@gmail.com>
>>> Date: Mon, Dec 17, 2012 at 1:50 AM
>>> Subject: mahout-pmml
>>> To: Grant Ingersoll <gs...@apache.org>
>>> Cc: Marty Kube <ma...@beavercreekconsulting.com>
>>> >
>>>
>>>
>>> Hi Grant,
>>>
>>> I start with this is the project https://github.com/voxsim/**mahout-pmml<https://github.com/voxsim/mahout-pmml>(I pushed only the skeleton for now) with mahout and jpmml integration (
>>> http://code.google.com/p/**jpmml/ <http://code.google.com/p/jpmml/>)
>>>
>>> I read the wiki about weka convertor https://cwiki.apache.org/**
>>> MAHOUT/creating-vectors-from-**wekas-arff-format.html<https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html>
>>> And I read the integration with Lucene http://searchhub.org/2010/03/**
>>> 16/integrating-apache-mahout-**with-apache-lucene-and-solr-**
>>> part-i-of-3/<http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/>
>>>
>>> In theory we need to do more similar to these parts, but different, we
>>> don't transfrom vector but model, Do i understand correctly?
>>>
>>> I'll request directly to you because you have in mind this idea and for
>>> now jpmml support this models
>>> NeuralNetwork
>>> RandomForest (implemented via Segmentation, which is a PMML version 4.0
>>> feature)
>>> RegressionModel
>>> TreeModel
>>> Are you interested in this models? Or Mahout focus on another one?
>>>
>>> Simon
>>>
>>> PS Marty before to start I need some answers sorry XD
>>>
>>>  ------------------------------**--------------
>> Grant Ingersoll
>> http://www.lucidworks.com
>>
>>
>>
>>
>>
>>
>

Re: mahout-pmml

Posted by Marty Kube <ma...@beavercreekconsulting.com>.
I took a look at JPMML...  At the bottom of it they have ran a JAXB 
compiler on the PMML V4 schema to generate Java bindings.  I didn't see 
a lot of value add in JPMML beyond that.

I'd say just add the schema and bindings generation to Mahout.  The 
value add here is model mapping from the JAXB generated model into the 
Mahout models.

On 12/20/2012 06:13 AM, Grant Ingersoll wrote:
>  From looking at PMML (http://www.dmg.org/v4-1/GeneralStructure.html), it seems that JPMML is not going to really get us there if it only supports the 4 models listed below.  I would think we could go through the structures supported in the link above and then map it to the Algorithms that are supported.  To start, perhaps it would make sense to focus on a few like: clustering, naive bayes and perhaps SGD will fit into the regression models.  Perhaps try to get K-Means and Naive Bayes to work first.
>
> FTR, I can only imagine how bloated these files are going to get since they use XML.  Thankfully, they won't be used to power the internals, just to support interoperability.
>
> -Grant
>
> On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:
>
>> Hi All,
>>
>> as Grant suggested, I forward the email about mahout-pmml.
>> I already tried jpmml standalone and works fine for me, the next important point is to understand or maybe create some example for each model described before:
>> NeuralNetwork
>> RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature)
>> RegressionModel
>> TreeModel
>> with only Mahout and next step create a convertor to create object from jpmml to Mahout. This is related only to import the object and for me the export object is more similar to these.
>>
>> Do you agree? Are you interested in this models? Or Mahout focus on another one?
>>
>> regards,
>> Simon
>>
>> ---------- Forwarded message ----------
>> From: Simon Vocella <vo...@gmail.com>
>> Date: Mon, Dec 17, 2012 at 1:50 AM
>> Subject: mahout-pmml
>> To: Grant Ingersoll <gs...@apache.org>
>> Cc: Marty Kube <ma...@beavercreekconsulting.com>
>>
>>
>> Hi Grant,
>>
>> I start with this is the project https://github.com/voxsim/mahout-pmml (I pushed only the skeleton for now) with mahout and jpmml integration (http://code.google.com/p/jpmml/)
>>
>> I read the wiki about weka convertor https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html
>> And I read the integration with Lucene http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
>>
>> In theory we need to do more similar to these parts, but different, we don't transfrom vector but model, Do i understand correctly?
>>
>> I'll request directly to you because you have in mind this idea and for now jpmml support this models
>> NeuralNetwork
>> RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature)
>> RegressionModel
>> TreeModel
>> Are you interested in this models? Or Mahout focus on another one?
>>
>> Simon
>>
>> PS Marty before to start I need some answers sorry XD
>>
> --------------------------------------------
> Grant Ingersoll
> http://www.lucidworks.com
>
>
>
>
>


Re: mahout-pmml

Posted by Grant Ingersoll <gs...@apache.org>.
From looking at PMML (http://www.dmg.org/v4-1/GeneralStructure.html), it seems that JPMML is not going to really get us there if it only supports the 4 models listed below.  I would think we could go through the structures supported in the link above and then map it to the Algorithms that are supported.  To start, perhaps it would make sense to focus on a few like: clustering, naive bayes and perhaps SGD will fit into the regression models.  Perhaps try to get K-Means and Naive Bayes to work first.

FTR, I can only imagine how bloated these files are going to get since they use XML.  Thankfully, they won't be used to power the internals, just to support interoperability.

-Grant

On Dec 19, 2012, at 8:12 AM, Simon Vocella wrote:

> Hi All,
> 
> as Grant suggested, I forward the email about mahout-pmml. 
> I already tried jpmml standalone and works fine for me, the next important point is to understand or maybe create some example for each model described before:
> NeuralNetwork
> RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature)
> RegressionModel
> TreeModel
> with only Mahout and next step create a convertor to create object from jpmml to Mahout. This is related only to import the object and for me the export object is more similar to these.
> 
> Do you agree? Are you interested in this models? Or Mahout focus on another one?
> 
> regards,
> Simon
> 
> ---------- Forwarded message ----------
> From: Simon Vocella <vo...@gmail.com>
> Date: Mon, Dec 17, 2012 at 1:50 AM
> Subject: mahout-pmml
> To: Grant Ingersoll <gs...@apache.org>
> Cc: Marty Kube <ma...@beavercreekconsulting.com>
> 
> 
> Hi Grant,
> 
> I start with this is the project https://github.com/voxsim/mahout-pmml (I pushed only the skeleton for now) with mahout and jpmml integration (http://code.google.com/p/jpmml/) 
> 
> I read the wiki about weka convertor https://cwiki.apache.org/MAHOUT/creating-vectors-from-wekas-arff-format.html
> And I read the integration with Lucene http://searchhub.org/2010/03/16/integrating-apache-mahout-with-apache-lucene-and-solr-part-i-of-3/
> 
> In theory we need to do more similar to these parts, but different, we don't transfrom vector but model, Do i understand correctly?
> 
> I'll request directly to you because you have in mind this idea and for now jpmml support this models
> NeuralNetwork
> RandomForest (implemented via Segmentation, which is a PMML version 4.0 feature)
> RegressionModel
> TreeModel
> Are you interested in this models? Or Mahout focus on another one?
> 
> Simon
> 
> PS Marty before to start I need some answers sorry XD
> 

--------------------------------------------
Grant Ingersoll
http://www.lucidworks.com