You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Ted Dunning <te...@gmail.com> on 2009/06/20 07:34:40 UTC
Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to
use Vector Writable
Sounds pretty good to me.
On Fri, Jun 19, 2009 at 6:38 PM, Grant Ingersoll <gs...@apache.org>wrote:
> So, should we just go to having everything be binary and then have
> Input/Output utilities that can take the binary format and output GSON?
> Seems like w/ Canopy, since it's used for feeding into other algorithms
> that it should output Writable as well, otherwise we're still going to be
> round tripping through Text.
>
> Then, it would be pretty easy to write a M/R job that takes Vectors and
> outputs asFormatString(), right?
>
Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable
Posted by Grant Ingersoll <gs...@apache.org>.
Seems like Canopy would need to be made Writable too, no?
On Jun 20, 2009, at 7:04 AM, Grant Ingersoll wrote:
>
> On Jun 20, 2009, at 3:01 AM, Robert Burrell Donkin wrote:
>
>>
>> Perhaps it would be better to move the conversion code (eg tabular ->
>> Vectors) from examples into either core or a new module so it can be
>> more easily be maintained and reused
>
> +1. Or utils, as I picture utils being the place where we keep
> things that aren't core, but are still useful. Of course, we also
> have, in core, o.a.m.utils, I believe. The difference, in my mind,
> is that the utils module is dependent on core, not the other way
> around, which is why I put the Lucene extraction stuff in there.
>
> You up for a patch for this? I think I have some time to work on
> converting the M/R if someone else can take on the I/O to from the
> user.
>
> -Grant
Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to use Vector Writable
Posted by Grant Ingersoll <gs...@apache.org>.
On Jun 20, 2009, at 3:01 AM, Robert Burrell Donkin wrote:
>
> Perhaps it would be better to move the conversion code (eg tabular ->
> Vectors) from examples into either core or a new module so it can be
> more easily be maintained and reused
+1. Or utils, as I picture utils being the place where we keep things
that aren't core, but are still useful. Of course, we also have, in
core, o.a.m.utils, I believe. The difference, in my mind, is that the
utils module is dependent on core, not the other way around, which is
why I put the Lucene extraction stuff in there.
You up for a patch for this? I think I have some time to work on
converting the M/R if someone else can take on the I/O to from the user.
-Grant
Re: [jira] Commented: (MAHOUT-136) Change Canopy MR Implementation to
use Vector Writable
Posted by Robert Burrell Donkin <ro...@gmail.com>.
On Saturday, June 20, 2009, Ted Dunning <te...@gmail.com> wrote:
> Sounds pretty good to me.
+1
> On Fri, Jun 19, 2009 at 6:38 PM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> So, should we just go to having everything be binary and then have
>> Input/Output utilities that can take the binary format and output GSON?
>> Seems like w/ Canopy, since it's used for feeding into other algorithms
>> that it should output Writable as well, otherwise we're still going to be
>> round tripping through Text.
+1
Perhaps it would be better to move the conversion code (eg tabular ->
Vectors) from examples into either core or a new module so it can be
more easily be maintained and reused
- Robert
>>
>> Then, it would be pretty easy to write a M/R job that takes Vectors and
>> outputs asFormatString(), right?
>>
>