You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by Eugen Paraschiv <ha...@gmail.com> on 2010/08/10 15:17:07 UTC

question about condition checking in Mahout

Hi,
We are starting to use Mahout for a recommendation project at work - it has
been a good experience so far. We went through the integration process
without any notable hurdles, and we’ve kept some notes along the way. I will
start with a general observation and follow up with some more in the future,
so that we may get some feedback from the community and help with patches
where needed.
One thing I’ve noticed is that there doesn’t seem to be a consistent way of
validating method arguments.
There is a Preconditions utility class in *G*uava (the formed Google
Collections) for checking various stuff such as not nulls or boolean
conditions:
Preconditions.checkNotNull( object ); or
Preconditions.checkState(  expression, “errorMessage” );
This can be used instead of manual checks such as:
if( delimiterTwo < 0 ){
      throw new IllegalArgumentException( "Bad line: " + line );
}
This kind of checking has many advantages:
- it ensures a consistent exception logic, making sure you always throw the
exact same exceptions in speciffic situations (such as checking an argument)
- it makes the code more readable and reduces the cognitive load for a
client going through it
- it keeps the method at a single level of abstraction - throwing an
exception when a parameter is invalid is a low level of abstraction whereas
the following logic is a higher level of abstraction - the two should not be
mixed
Seeing how *Guava* is already a dependency, it makes sense to use it to it
full potential. Reducing the cognitive load of the code may not be that
relevant in some cases, but it is very important when dealing with very long
methods such as *FileDataModel* - *processLine*.
Is this a direction that makes sense for Mahout? If so, how should I go
forward with helping - is the standard route of opening JIRA issues and
submitting patches OK?
Thanks.
Eugen.

Re: question about condition checking in Mahout

Posted by Sean Owen <sr...@gmail.com>.
Sounds good.

I'm not worried about different behavior, but just inconsistent
implementation of that behavior internally

I think you are likely welcome to be a little aggressive in adding
argument checks. If you flag a precondition that shouldn't be
restricted, it is easy to discover and may well be something that
should be restricted.

On Fri, Aug 13, 2010 at 9:18 AM, Eugen Paraschiv <ha...@gmail.com> wrote:
> Sure, makes sense to do this according to the boyscout principle and based
> on patches. I will start working on such a patch for the code area I'm
> working with, not for the whole project. As for an inconsistent state for
> the condition checking logic, it should not be an issue, as the
> Preconditions class throws the exact *same exceptions* as the ones that are
> thrown manually, so in fact it should behave exactly the same for a client
> of the class.
> About clustering and classification, I have not worked with that portion of
> the code yet - I'm focusing on recommendation algorithms for now.
> Thanks for the feedback, I'll make sure to open the JIRA issue.
> Eugen.

Re: question about condition checking in Mahout

Posted by Eugen Paraschiv <ha...@gmail.com>.
Sure, makes sense to do this according to the boyscout principle and based
on patches. I will start working on such a patch for the code area I'm
working with, not for the whole project. As for an inconsistent state for
the condition checking logic, it should not be an issue, as the
Preconditions class throws the exact *same exceptions* as the ones that are
thrown manually, so in fact it should behave exactly the same for a client
of the class.
About clustering and classification, I have not worked with that portion of
the code yet - I'm focusing on recommendation algorithms for now.
Thanks for the feedback, I'll make sure to open the JIRA issue.
Eugen.

On Tue, Aug 10, 2010 at 11:00 PM, Ted Dunning <te...@gmail.com> wrote:

> Guava is a dependency that I added not so long ago.
>
> I think it is good practice to do this going forward, but it really will be
> another one of those forever cleanup tasks to get it done completely.
>
> Certainly, it is fine to add checks in the new style, but I don't see an
> issue leaving old checks in the old style unless somebody is actually
> working on them.  I don't think that makes our consistency all that much
> worse than it is and it makes the modified code better.
>
> More important in my mind is the data-structure rationalization that we
> need
> to do for clustering and classification.
>
> On Tue, Aug 10, 2010 at 10:18 AM, Sean Owen <sr...@gmail.com> wrote:
>
> > Is Guava already a dependency somehow? maybe so, I never looked.
> >
> > If so I find this a decent idea. It's a big undertaking to convert all
> > arg checking, and perhaps tighten up argument checking in other
> > places. You'd be welcome to do so and file a JIRA with a patch.
> >
> > My concern would be a half-way patch. Then we'd be in a more
> > inconsistent state and that's not good.
> >
> > On Tue, Aug 10, 2010 at 8:17 AM, Eugen Paraschiv <ha...@gmail.com>
> > wrote:
> > > Hi,
> > > We are starting to use Mahout for a recommendation project at work - it
> > has
> > > been a good experience so far. We went through the integration process
> > > without any notable hurdles, and we’ve kept some notes along the way. I
> > will
> > > start with a general observation and follow up with some more in the
> > future,
> > > so that we may get some feedback from the community and help with
> patches
> > > where needed.
> > > One thing I’ve noticed is that there doesn’t seem to be a consistent
> way
> > of
> > > validating method arguments.
> > > There is a Preconditions utility class in *G*uava (the formed Google
> > > Collections) for checking various stuff such as not nulls or boolean
> > > conditions:
> > > Preconditions.checkNotNull( object ); or
> > > Preconditions.checkState(  expression, “errorMessage” );
> > > This can be used instead of manual checks such as:
> > > if( delimiterTwo < 0 ){
> > >      throw new IllegalArgumentException( "Bad line: " + line );
> > > }
> > > This kind of checking has many advantages:
> > > - it ensures a consistent exception logic, making sure you always throw
> > the
> > > exact same exceptions in speciffic situations (such as checking an
> > argument)
> > > - it makes the code more readable and reduces the cognitive load for a
> > > client going through it
> > > - it keeps the method at a single level of abstraction - throwing an
> > > exception when a parameter is invalid is a low level of abstraction
> > whereas
> > > the following logic is a higher level of abstraction - the two should
> not
> > be
> > > mixed
> > > Seeing how *Guava* is already a dependency, it makes sense to use it to
> > it
> > > full potential. Reducing the cognitive load of the code may not be that
> > > relevant in some cases, but it is very important when dealing with very
> > long
> > > methods such as *FileDataModel* - *processLine*.
> > > Is this a direction that makes sense for Mahout? If so, how should I go
> > > forward with helping - is the standard route of opening JIRA issues and
> > > submitting patches OK?
> > > Thanks.
> > > Eugen.
> > >
> >
>

Re: question about condition checking in Mahout

Posted by Ted Dunning <te...@gmail.com>.
Guava is a dependency that I added not so long ago.

I think it is good practice to do this going forward, but it really will be
another one of those forever cleanup tasks to get it done completely.

Certainly, it is fine to add checks in the new style, but I don't see an
issue leaving old checks in the old style unless somebody is actually
working on them.  I don't think that makes our consistency all that much
worse than it is and it makes the modified code better.

More important in my mind is the data-structure rationalization that we need
to do for clustering and classification.

On Tue, Aug 10, 2010 at 10:18 AM, Sean Owen <sr...@gmail.com> wrote:

> Is Guava already a dependency somehow? maybe so, I never looked.
>
> If so I find this a decent idea. It's a big undertaking to convert all
> arg checking, and perhaps tighten up argument checking in other
> places. You'd be welcome to do so and file a JIRA with a patch.
>
> My concern would be a half-way patch. Then we'd be in a more
> inconsistent state and that's not good.
>
> On Tue, Aug 10, 2010 at 8:17 AM, Eugen Paraschiv <ha...@gmail.com>
> wrote:
> > Hi,
> > We are starting to use Mahout for a recommendation project at work - it
> has
> > been a good experience so far. We went through the integration process
> > without any notable hurdles, and we’ve kept some notes along the way. I
> will
> > start with a general observation and follow up with some more in the
> future,
> > so that we may get some feedback from the community and help with patches
> > where needed.
> > One thing I’ve noticed is that there doesn’t seem to be a consistent way
> of
> > validating method arguments.
> > There is a Preconditions utility class in *G*uava (the formed Google
> > Collections) for checking various stuff such as not nulls or boolean
> > conditions:
> > Preconditions.checkNotNull( object ); or
> > Preconditions.checkState(  expression, “errorMessage” );
> > This can be used instead of manual checks such as:
> > if( delimiterTwo < 0 ){
> >      throw new IllegalArgumentException( "Bad line: " + line );
> > }
> > This kind of checking has many advantages:
> > - it ensures a consistent exception logic, making sure you always throw
> the
> > exact same exceptions in speciffic situations (such as checking an
> argument)
> > - it makes the code more readable and reduces the cognitive load for a
> > client going through it
> > - it keeps the method at a single level of abstraction - throwing an
> > exception when a parameter is invalid is a low level of abstraction
> whereas
> > the following logic is a higher level of abstraction - the two should not
> be
> > mixed
> > Seeing how *Guava* is already a dependency, it makes sense to use it to
> it
> > full potential. Reducing the cognitive load of the code may not be that
> > relevant in some cases, but it is very important when dealing with very
> long
> > methods such as *FileDataModel* - *processLine*.
> > Is this a direction that makes sense for Mahout? If so, how should I go
> > forward with helping - is the standard route of opening JIRA issues and
> > submitting patches OK?
> > Thanks.
> > Eugen.
> >
>

Re: question about condition checking in Mahout

Posted by Sean Owen <sr...@gmail.com>.
Is Guava already a dependency somehow? maybe so, I never looked.

If so I find this a decent idea. It's a big undertaking to convert all
arg checking, and perhaps tighten up argument checking in other
places. You'd be welcome to do so and file a JIRA with a patch.

My concern would be a half-way patch. Then we'd be in a more
inconsistent state and that's not good.

On Tue, Aug 10, 2010 at 8:17 AM, Eugen Paraschiv <ha...@gmail.com> wrote:
> Hi,
> We are starting to use Mahout for a recommendation project at work - it has
> been a good experience so far. We went through the integration process
> without any notable hurdles, and we’ve kept some notes along the way. I will
> start with a general observation and follow up with some more in the future,
> so that we may get some feedback from the community and help with patches
> where needed.
> One thing I’ve noticed is that there doesn’t seem to be a consistent way of
> validating method arguments.
> There is a Preconditions utility class in *G*uava (the formed Google
> Collections) for checking various stuff such as not nulls or boolean
> conditions:
> Preconditions.checkNotNull( object ); or
> Preconditions.checkState(  expression, “errorMessage” );
> This can be used instead of manual checks such as:
> if( delimiterTwo < 0 ){
>      throw new IllegalArgumentException( "Bad line: " + line );
> }
> This kind of checking has many advantages:
> - it ensures a consistent exception logic, making sure you always throw the
> exact same exceptions in speciffic situations (such as checking an argument)
> - it makes the code more readable and reduces the cognitive load for a
> client going through it
> - it keeps the method at a single level of abstraction - throwing an
> exception when a parameter is invalid is a low level of abstraction whereas
> the following logic is a higher level of abstraction - the two should not be
> mixed
> Seeing how *Guava* is already a dependency, it makes sense to use it to it
> full potential. Reducing the cognitive load of the code may not be that
> relevant in some cases, but it is very important when dealing with very long
> methods such as *FileDataModel* - *processLine*.
> Is this a direction that makes sense for Mahout? If so, how should I go
> forward with helping - is the standard route of opening JIRA issues and
> submitting patches OK?
> Thanks.
> Eugen.
>