You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@mahout.apache.org by Pat Ferrel <pa...@gmail.com> on 2014/07/01 02:39:02 UTC

Re: cf/couccurence code

No argument, just trying to decide whether to create core-scala or keep dumping anything not Spark dependent in math-scala. 

On Jun 30, 2014, at 9:32 AM, Ted Dunning <te...@gmail.com> wrote:

On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <pa...@gmail.com> wrote:

> Speaking for Sebastian and Dmitriy (with some ignorance) I think the idea
> was to isolate things with Spark dependencies something like we did before
> with Hadoop.

Go ahead and speak for me as well here!

I think isolating the dependencies is crucial for platform nimbleness
(nimbility?)

Re: cf/couccurence code

Posted by Anand Avati <av...@gluster.org>.

Pat,

I agree that proposal is not ideal, and your points are of course valid.
All I'm saying is solving the code vs test module is a separate issue, not
a non-issue. However it is independent of the "right location of cf code
problem."

Here's a PR for just the code move: https://github.com/apache/mahout/pull/26



On Wed, Jul 9, 2014 at 8:44 AM, Pat Ferrel <pa...@gmail.com> wrote:

> Hmm, that doesn’t seem like a good idea. Since there is precedence and for
> the sake of argument I’ll go ahead and do it but:
>
> 1) it means the wrong module will fail a build test when the error in not
> in the test
> 2) it is a kind of lie about the dependencies of a module. A consumer
> would think they can include only math-scala in a project but some
> ill-defined parts of it are useless without spark. So no real separation
> can be made. I understand that this is so some hypothetical future engine
> module can replace spark, but it would have to come with an awful lot of
> stuff including many of the build tests for math-scala. This only adds to
> my concern over this approach and will result in the real and current
> implementation on Spark to be misleading and confusing in it’s structure.
>
> But as I said for the sake of avoiding further argument I’ll separate impl
> from test.
>
> On Jul 8, 2014, at 6:42 PM, Anand Avati <av...@gluster.org> wrote:
>
> If that is the case, why not commit so much already (i.e, separate modules
> for code and test) since that has been the "norm" thus far (see DSSVD,
> DSPCA etc.) Fixing code vs test modules could be a separate task/activity
> (which I'm happy to pick up) on which cf code move need not be dependent on.
>
>
> On Tue, Jul 8, 2014 at 6:14 PM, Pat Ferrel <pa...@gmail.com> wrote:
>
>> I already did the code and tests in separate modules, that works but is
>> not a good way to go imo. If there are tests that will work in math-scala
>> then we can put the code in math-scala. I couldn’t find a way to do it.
>>
>>
>> On Jul 8, 2014, at 4:40 PM, Anand Avati <av...@gluster.org> wrote:
>>
>> I'm not completely sure how to address this (code and tests in separate
>> modules) as I write, but I will give it a shot soon.
>>
>>
>> On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <pa...@gmail.com> wrote:
>>
>> > OK, I’m spending more time on this than I have to spare. The test class
>> > extends MahoutLocalContext, which provides an implicit Spark context. I
>> > haven’t found a way to test parallel execution of cooccurrence without
>> it.
>> > So far the only obvious option is to put cf into math-scala but the
>> tests
>> > would have to remain in spark and that seems like trouble so I’d rather
>> not
>> > do that.
>> >
>> > I suspect as more math-scala consuming algos get implemented this issue
>> > will proliferate. We will have implementations that do not require Spark
>> > but tests that do. We could create a new sub-project that allows for
>> this I
>> > suppose but a new sub-project will require changes to SparkEngine and
>> > mahout’s script.
>> >
>> > If someone (Anand?) wants to offer a PR with some way around this I’d be
>> > happy to integrate.
>> >
>> > On Jun 30, 2014, at 5:39 PM, Pat Ferrel <pa...@gmail.com> wrote:
>> >
>> > No argument, just trying to decide whether to create core-scala or keep
>> > dumping anything not Spark dependent in math-scala.
>> >
>> > On Jun 30, 2014, at 9:32 AM, Ted Dunning <te...@gmail.com> wrote:
>> >
>> > On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <pa...@gmail.com>
>> wrote:
>> >
>> >> Speaking for Sebastian and Dmitriy (with some ignorance) I think the
>> idea
>> >> was to isolate things with Spark dependencies something like we did
>> > before
>> >> with Hadoop.
>> >
>> >
>> > Go ahead and speak for me as well here!
>> >
>> > I think isolating the dependencies is crucial for platform nimbleness
>> > (nimbility?)
>> >
>> >
>> >
>>
>>
>
>

Re: cf/couccurence code

Posted by Pat Ferrel <pa...@gmail.com>.

Hmm, that doesn’t seem like a good idea. Since there is precedence and for the sake of argument I’ll go ahead and do it but:

1) it means the wrong module will fail a build test when the error in not in the test
2) it is a kind of lie about the dependencies of a module. A consumer would think they can include only math-scala in a project but some ill-defined parts of it are useless without spark. So no real separation can be made. I understand that this is so some hypothetical future engine module can replace spark, but it would have to come with an awful lot of stuff including many of the build tests for math-scala. This only adds to my concern over this approach and will result in the real and current implementation on Spark to be misleading and confusing in it’s structure.

But as I said for the sake of avoiding further argument I’ll separate impl from test.

On Jul 8, 2014, at 6:42 PM, Anand Avati <av...@gluster.org> wrote:

If that is the case, why not commit so much already (i.e, separate modules for code and test) since that has been the "norm" thus far (see DSSVD, DSPCA etc.) Fixing code vs test modules could be a separate task/activity (which I'm happy to pick up) on which cf code move need not be dependent on.

On Tue, Jul 8, 2014 at 6:14 PM, Pat Ferrel <pa...@gmail.com> wrote:
I already did the code and tests in separate modules, that works but is not a good way to go imo. If there are tests that will work in math-scala then we can put the code in math-scala. I couldn’t find a way to do it.

On Jul 8, 2014, at 4:40 PM, Anand Avati <av...@gluster.org> wrote:

I'm not completely sure how to address this (code and tests in separate
modules) as I write, but I will give it a shot soon.

On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <pa...@gmail.com> wrote:

> OK, I’m spending more time on this than I have to spare. The test class
> extends MahoutLocalContext, which provides an implicit Spark context. I
> haven’t found a way to test parallel execution of cooccurrence without it.
> So far the only obvious option is to put cf into math-scala but the tests
> would have to remain in spark and that seems like trouble so I’d rather not
> do that.
>
> I suspect as more math-scala consuming algos get implemented this issue
> will proliferate. We will have implementations that do not require Spark
> but tests that do. We could create a new sub-project that allows for this I
> suppose but a new sub-project will require changes to SparkEngine and
> mahout’s script.
>
> If someone (Anand?) wants to offer a PR with some way around this I’d be
> happy to integrate.
>
> On Jun 30, 2014, at 5:39 PM, Pat Ferrel <pa...@gmail.com> wrote:
>
> No argument, just trying to decide whether to create core-scala or keep
> dumping anything not Spark dependent in math-scala.
>
> On Jun 30, 2014, at 9:32 AM, Ted Dunning <te...@gmail.com> wrote:
>
> On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <pa...@gmail.com> wrote:
>
>> Speaking for Sebastian and Dmitriy (with some ignorance) I think the idea
>> was to isolate things with Spark dependencies something like we did
> before
>> with Hadoop.
>
>
> Go ahead and speak for me as well here!
>
> I think isolating the dependencies is crucial for platform nimbleness
> (nimbility?)
>
>
>

Re: cf/couccurence code

Posted by Anand Avati <av...@gluster.org>.

If that is the case, why not commit so much already (i.e, separate modules
for code and test) since that has been the "norm" thus far (see DSSVD,
DSPCA etc.) Fixing code vs test modules could be a separate task/activity
(which I'm happy to pick up) on which cf code move need not be dependent on.


On Tue, Jul 8, 2014 at 6:14 PM, Pat Ferrel <pa...@gmail.com> wrote:

> I already did the code and tests in separate modules, that works but is
> not a good way to go imo. If there are tests that will work in math-scala
> then we can put the code in math-scala. I couldn’t find a way to do it.
>
>
> On Jul 8, 2014, at 4:40 PM, Anand Avati <av...@gluster.org> wrote:
>
> I'm not completely sure how to address this (code and tests in separate
> modules) as I write, but I will give it a shot soon.
>
>
> On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <pa...@gmail.com> wrote:
>
> > OK, I’m spending more time on this than I have to spare. The test class
> > extends MahoutLocalContext, which provides an implicit Spark context. I
> > haven’t found a way to test parallel execution of cooccurrence without
> it.
> > So far the only obvious option is to put cf into math-scala but the tests
> > would have to remain in spark and that seems like trouble so I’d rather
> not
> > do that.
> >
> > I suspect as more math-scala consuming algos get implemented this issue
> > will proliferate. We will have implementations that do not require Spark
> > but tests that do. We could create a new sub-project that allows for
> this I
> > suppose but a new sub-project will require changes to SparkEngine and
> > mahout’s script.
> >
> > If someone (Anand?) wants to offer a PR with some way around this I’d be
> > happy to integrate.
> >
> > On Jun 30, 2014, at 5:39 PM, Pat Ferrel <pa...@gmail.com> wrote:
> >
> > No argument, just trying to decide whether to create core-scala or keep
> > dumping anything not Spark dependent in math-scala.
> >
> > On Jun 30, 2014, at 9:32 AM, Ted Dunning <te...@gmail.com> wrote:
> >
> > On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <pa...@gmail.com>
> wrote:
> >
> >> Speaking for Sebastian and Dmitriy (with some ignorance) I think the
> idea
> >> was to isolate things with Spark dependencies something like we did
> > before
> >> with Hadoop.
> >
> >
> > Go ahead and speak for me as well here!
> >
> > I think isolating the dependencies is crucial for platform nimbleness
> > (nimbility?)
> >
> >
> >
>
>

Re: cf/couccurence code

Posted by Pat Ferrel <pa...@gmail.com>.

I already did the code and tests in separate modules, that works but is not a good way to go imo. If there are tests that will work in math-scala then we can put the code in math-scala. I couldn’t find a way to do it. 


On Jul 8, 2014, at 4:40 PM, Anand Avati <av...@gluster.org> wrote:

I'm not completely sure how to address this (code and tests in separate
modules) as I write, but I will give it a shot soon.


On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <pa...@gmail.com> wrote:

> OK, I’m spending more time on this than I have to spare. The test class
> extends MahoutLocalContext, which provides an implicit Spark context. I
> haven’t found a way to test parallel execution of cooccurrence without it.
> So far the only obvious option is to put cf into math-scala but the tests
> would have to remain in spark and that seems like trouble so I’d rather not
> do that.
> 
> I suspect as more math-scala consuming algos get implemented this issue
> will proliferate. We will have implementations that do not require Spark
> but tests that do. We could create a new sub-project that allows for this I
> suppose but a new sub-project will require changes to SparkEngine and
> mahout’s script.
> 
> If someone (Anand?) wants to offer a PR with some way around this I’d be
> happy to integrate.
> 
> On Jun 30, 2014, at 5:39 PM, Pat Ferrel <pa...@gmail.com> wrote:
> 
> No argument, just trying to decide whether to create core-scala or keep
> dumping anything not Spark dependent in math-scala.
> 
> On Jun 30, 2014, at 9:32 AM, Ted Dunning <te...@gmail.com> wrote:
> 
> On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <pa...@gmail.com> wrote:
> 
>> Speaking for Sebastian and Dmitriy (with some ignorance) I think the idea
>> was to isolate things with Spark dependencies something like we did
> before
>> with Hadoop.
> 
> 
> Go ahead and speak for me as well here!
> 
> I think isolating the dependencies is crucial for platform nimbleness
> (nimbility?)
> 
> 
>

Re: cf/couccurence code

Posted by Anand Avati <av...@gluster.org>.

I'm not completely sure how to address this (code and tests in separate
modules) as I write, but I will give it a shot soon.


On Mon, Jul 7, 2014 at 9:18 AM, Pat Ferrel <pa...@gmail.com> wrote:

> OK, I’m spending more time on this than I have to spare. The test class
> extends MahoutLocalContext, which provides an implicit Spark context. I
> haven’t found a way to test parallel execution of cooccurrence without it.
> So far the only obvious option is to put cf into math-scala but the tests
> would have to remain in spark and that seems like trouble so I’d rather not
> do that.
>
> I suspect as more math-scala consuming algos get implemented this issue
> will proliferate. We will have implementations that do not require Spark
> but tests that do. We could create a new sub-project that allows for this I
> suppose but a new sub-project will require changes to SparkEngine and
> mahout’s script.
>
> If someone (Anand?) wants to offer a PR with some way around this I’d be
> happy to integrate.
>
> On Jun 30, 2014, at 5:39 PM, Pat Ferrel <pa...@gmail.com> wrote:
>
> No argument, just trying to decide whether to create core-scala or keep
> dumping anything not Spark dependent in math-scala.
>
> On Jun 30, 2014, at 9:32 AM, Ted Dunning <te...@gmail.com> wrote:
>
> On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <pa...@gmail.com> wrote:
>
> > Speaking for Sebastian and Dmitriy (with some ignorance) I think the idea
> > was to isolate things with Spark dependencies something like we did
> before
> > with Hadoop.
>
>
> Go ahead and speak for me as well here!
>
> I think isolating the dependencies is crucial for platform nimbleness
> (nimbility?)
>
>
>

Re: cf/couccurence code

Posted by Pat Ferrel <pa...@gmail.com>.

OK, I’m spending more time on this than I have to spare. The test class extends MahoutLocalContext, which provides an implicit Spark context. I haven’t found a way to test parallel execution of cooccurrence without it. So far the only obvious option is to put cf into math-scala but the tests would have to remain in spark and that seems like trouble so I’d rather not do that.

I suspect as more math-scala consuming algos get implemented this issue will proliferate. We will have implementations that do not require Spark but tests that do. We could create a new sub-project that allows for this I suppose but a new sub-project will require changes to SparkEngine and mahout’s script.

If someone (Anand?) wants to offer a PR with some way around this I’d be happy to integrate.

On Jun 30, 2014, at 5:39 PM, Pat Ferrel <pa...@gmail.com> wrote:

No argument, just trying to decide whether to create core-scala or keep dumping anything not Spark dependent in math-scala. 

On Jun 30, 2014, at 9:32 AM, Ted Dunning <te...@gmail.com> wrote:

On Mon, Jun 30, 2014 at 8:36 AM, Pat Ferrel <pa...@gmail.com> wrote:

> Speaking for Sebastian and Dmitriy (with some ignorance) I think the idea
> was to isolate things with Spark dependencies something like we did before
> with Hadoop.

Go ahead and speak for me as well here!

I think isolating the dependencies is crucial for platform nimbleness
(nimbility?)