You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Gokhan Capan <gk...@gmail.com> on 2013/04/06 17:47:12 UTC

Any interest in Data Preparation?

Hi,

Are you guys interested in Weka like filters implementation,
like NominalToBinary, Discretize etc.

I started to implement in-memory versions running on Mahout Matrix, and
plan to extend the implementations so they could run on sequence files of
IntWritable, VectorWritable pairs.

--
Gokhan

Re: Any interest in Data Preparation?

Posted by Gokhan Capan <gk...@gmail.com>.
I consider this as a tiny step of a larger making-it-more-usable action.

Ted,
I actually started this to evaluate my implementation of factorization machines, I'm going to write about it after trying on some data on the thread you started, we could talk about details there. 

About data tools, even if they cannot be used immediately, they can stay there until we figure out a way to integrate, they are still on a very early phase anyway.


Sent from my iPad

On Apr 6, 2013, at 11:17 PM, Ted Dunning <te...@gmail.com> wrote:

> I differ a bit in that these are important to have in general.
> 
> Unfortunately, however, our current command line structure would make these
> really inefficient to use.
> 
> 
> 
> 
> On Sat, Apr 6, 2013 at 9:22 AM, Sebastian Schelter
> <ss...@googlemail.com>wrote:
> 
>> In general, I think it is great to have such tools. But they should be
>> developed in context with a specific algorithm or problem.
>> 
>> On 06.04.2013 17:47, Gokhan Capan wrote:
>>> Hi,
>>> 
>>> Are you guys interested in Weka like filters implementation,
>>> like NominalToBinary, Discretize etc.
>>> 
>>> I started to implement in-memory versions running on Mahout Matrix, and
>>> plan to extend the implementations so they could run on sequence files of
>>> IntWritable, VectorWritable pairs.
>>> 
>>> --
>>> Gokhan
>> 
>> 

Re: Any interest in Data Preparation?

Posted by Ted Dunning <te...@gmail.com>.
I differ a bit in that these are important to have in general.

Unfortunately, however, our current command line structure would make these
really inefficient to use.




On Sat, Apr 6, 2013 at 9:22 AM, Sebastian Schelter
<ss...@googlemail.com>wrote:

> In general, I think it is great to have such tools. But they should be
> developed in context with a specific algorithm or problem.
>
> On 06.04.2013 17:47, Gokhan Capan wrote:
> > Hi,
> >
> > Are you guys interested in Weka like filters implementation,
> > like NominalToBinary, Discretize etc.
> >
> > I started to implement in-memory versions running on Mahout Matrix, and
> > plan to extend the implementations so they could run on sequence files of
> > IntWritable, VectorWritable pairs.
> >
> > --
> > Gokhan
> >
>
>

Re: Any interest in Data Preparation?

Posted by Sebastian Schelter <ss...@googlemail.com>.
In general, I think it is great to have such tools. But they should be
developed in context with a specific algorithm or problem.

On 06.04.2013 17:47, Gokhan Capan wrote:
> Hi,
> 
> Are you guys interested in Weka like filters implementation,
> like NominalToBinary, Discretize etc.
> 
> I started to implement in-memory versions running on Mahout Matrix, and
> plan to extend the implementations so they could run on sequence files of
> IntWritable, VectorWritable pairs.
> 
> --
> Gokhan
>