You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2009/06/20 07:34:40 UTC

Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable

Sounds pretty good to me.

On Fri, Jun 19, 2009 at 6:38 PM, Grant Ingersoll <gs...@apache.org>wrote:

> So, should we just go to having everything be binary and then have
> Input/Output utilities that can take the binary format and output GSON?
>  Seems like w/ Canopy, since it's used for feeding into other algorithms
> that it should output Writable as well, otherwise we're still going to be
> round tripping through Text.
>
> Then, it would be pretty easy to write a M/R job that takes Vectors and
> outputs asFormatString(), right?
>

Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable

Posted by Grant Ingersoll <gs...@apache.org>.
Seems like Canopy would need to be made Writable too, no?

On Jun 20, 2009, at 7:04 AM, Grant Ingersoll wrote:

>
> On Jun 20, 2009, at 3:01 AM, Robert Burrell Donkin wrote:
>
>>
>> Perhaps it would be better to move the conversion code (eg tabular ->
>> Vectors) from examples into either core or a new module so it can be
>> more easily be maintained and reused
>
> +1.  Or utils, as I picture utils being the place where we keep  
> things that aren't core, but are still useful.  Of course, we also  
> have, in core, o.a.m.utils, I believe.  The difference, in my mind,  
> is that the utils module is dependent on core, not the other way  
> around, which is why I put the Lucene extraction stuff in there.
>
> You up for a patch for this?  I think I have some time to work on  
> converting the M/R if someone else can take on the I/O to from the  
> user.
>
> -Grant



Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable

Posted by Grant Ingersoll <gs...@apache.org>.
On Jun 20, 2009, at 3:01 AM, Robert Burrell Donkin wrote:

>
> Perhaps it would be better to move the conversion code (eg tabular ->
> Vectors) from examples into either core or a new module so it can be
> more easily be maintained and reused

+1.  Or utils, as I picture utils being the place where we keep things  
that aren't core, but are still useful.  Of course, we also have, in  
core, o.a.m.utils, I believe.  The difference, in my mind, is that the  
utils module is dependent on core, not the other way around, which is  
why I put the Lucene extraction stuff in there.

You up for a patch for this?  I think I have some time to work on  
converting the M/R if someone else can take on the I/O to from the user.

-Grant

Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable

Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Saturday, June 20, 2009, Ted Dunning <te...@gmail.com> wrote:
> Sounds pretty good to me.

+1

> On Fri, Jun 19, 2009 at 6:38 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> So, should we just go to having everything be binary and then have
>> Input/Output utilities that can take the binary format and output GSON?
>>  Seems like w/ Canopy, since it's used for feeding into other algorithms
>> that it should output Writable as well, otherwise we're still going to be
>> round tripping through Text.

+1

Perhaps it would be better to move the conversion code (eg tabular ->
Vectors) from examples into either core or a new module so it can be
more easily be maintained and reused

- Robert

>>
>> Then, it would be pretty easy to write a M/R job that takes Vectors and
>> outputs asFormatString(), right?
>>
>