You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@mahout.apache.org by "Sean Owen (JIRA)" <ji...@apache.org> on 2010/04/14 11:26:52 UTC

[jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

    [ https://issues.apache.org/jira/browse/MAHOUT-379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856812#action_12856812 ] 

Sean Owen commented on MAHOUT-379:
----------------------------------

Yeah let's take some time to get this right. At the moment I see four notions of equivalence in Vector (which is down from five!), and this seems like one too many:

==: of course
equals(): compares values, names, not implementation
equivalent(): compares values only
strictEquivalence(): compares values, names, implementation

equals() ought to be strict-ish. Its current implementation is fine, though conventional wisdom is that it's better to only consider instances of the same class equals() in order to avoid transitivity problems. I think that's a valid concern here.

Therefore I submit that equals() should be replaced with what strictEquivalence() does.

(And then, of course, fix the underlying issue that was raised too!)


> SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent
> ---------------------------------------------------------------------------------
>
>                 Key: MAHOUT-379
>                 URL: https://issues.apache.org/jira/browse/MAHOUT-379
>             Project: Mahout
>          Issue Type: Bug
>          Components: Math
>    Affects Versions: 0.4
>            Reporter: Danny Leshem
>            Priority: Minor
>             Fix For: 0.3
>
>
> When a SequentialAccessSparseVector is serialized and deserialized using VectorWritable, the result vector and the original vector are equivalent, yet equals returns false.
> The following unit-test reproduces the problem:
> {code}
> @Test
> public void testSequentialAccessSparseVectorEquals() throws Exception {
>     final Vector v = new SequentialAccessSparseVector(1);
>     final VectorWritable vectorWritable = new VectorWritable(v);
>     final VectorWritable vectorWritable2 = new VectorWritable();
>     writeAndRead(vectorWritable, vectorWritable2);
>     final Vector v2 = vectorWritable2.get();
>     assertTrue(AbstractVector.equivalent(v, v2));
>     assertEquals(v, v2); // This line fails!
> }
> private void writeAndRead(Writable toWrite, Writable toRead) throws IOException {
>     final ByteArrayOutputStream baos = new ByteArrayOutputStream();
>     final DataOutputStream dos = new DataOutputStream(baos);
>     toWrite.write(dos);
>     final ByteArrayInputStream bais = new ByteArrayInputStream(baos.toByteArray());
>     final DataInputStream dis = new DataInputStream(bais);
>     toRead.readFields(dis);
> }
> {code}
> The problem seems to be that the original vector name is null, while the new vector's name is an empty string. The same issue probably also happens with RandomAccessSparseVector.
> SequentialAccessSparseVectorWritable (line 40):
> {code}
> dataOutput.writeUTF(getName() == null ? "" : getName());
> {code}
> RandomAccessSparseVectorWritable (line 42):
> {code}
> dataOutput.writeUTF(this.getName() == null ? "" : this.getName());
> {code}
> The simplest fix is probably to change the default Vector's name from null to the empty string.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

Posted by Jake Mannix <ja...@gmail.com>.
+1

-jake

On Apr 14, 2010 3:20 PM, "Jeff Eastman" <jd...@windwardsolutions.com> wrote:

Ted Dunning wrote: > > On Wed, Apr 14, 2010 at 12:53 PM, Sean Owen <
srowen@gmail.com> wrote: > >   >...
+1 from the creator thereof, even. Especially since they never got used.

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

Posted by Jeff Eastman <jd...@windwardsolutions.com>.
Ted Dunning wrote:
> On Wed, Apr 14, 2010 at 12:53 PM, Sean Owen <sr...@gmail.com> wrote:
>
>   
>>> I would actually prefer ripping names out of the base vectors entirely.
>>>  They should decorate the mathematical vector, but as their use is
>>>       
>> decidedly
>>     
>>> non-mathematical and application specific (we basically don't use them at
>>> all currently, it should be noted, and remember the YAGNI principle...),
>>> keeping them in the base mahout-math vector classes seems wrong.
>>>       
>> You read my mind. I agree entirely and it simplifies things here a lot.
>> I'd also say this about labels, but first things first perhaps.
>>
>> ...
>> If I can get any other +1s I'll prepare a patch.
>>
>>     
>
>
> +1
>
>   
+1 from the creator thereof, even. Especially since they never got used.

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

Posted by Ted Dunning <te...@gmail.com>.
On Wed, Apr 14, 2010 at 12:53 PM, Sean Owen <sr...@gmail.com> wrote:

>
> > I would actually prefer ripping names out of the base vectors entirely.
> >  They should decorate the mathematical vector, but as their use is
> decidedly
> > non-mathematical and application specific (we basically don't use them at
> > all currently, it should be noted, and remember the YAGNI principle...),
> > keeping them in the base mahout-math vector classes seems wrong.
>
> You read my mind. I agree entirely and it simplifies things here a lot.
> I'd also say this about labels, but first things first perhaps.
>
> ...
> If I can get any other +1s I'll prepare a patch.
>


+1

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

Posted by Sean Owen <sr...@gmail.com>.
On Wed, Apr 14, 2010 at 3:28 PM, Jake Mannix <ja...@gmail.com> wrote:
> What is the transitivity problem?  If (a instanceof VClassA), (b instanceof
> VClassB) and (c instanceof VClassC), if all three equals() methods compare
> the same things (ie values, names, not implementation), then a.equals(b) &&
> b.equals(c) ==> a.equals(c), right?

The potential issue comes in two flavors.

Say the current vector implementations equals() without regard to
implementation, which is by no means wrong per se. But then say
someone implements MyVector which is only equals() to MyVectors --
which is conventional for equals() implementations. This breaks
transitivity since a DenseVector might equals() a MyVector but not
vice versa. .. and I now see that's really a symmetry problem not
transitivity. But oh well.

That's just a coding error but a subtle one. The other possibility is
implementing a, I dunno, ColoredVector which adds colors to each
dimension. Even if ColoredVector implements equals() correctly and
without regard to implementation class, transitivity breaks when two
ColoredVectors with different colors but identical values are compared
to a standard Vector implementation. They're both equals() to it but
not equals() each other.


> I would actually prefer ripping names out of the base vectors entirely.
>  They should decorate the mathematical vector, but as their use is decidedly
> non-mathematical and application specific (we basically don't use them at
> all currently, it should be noted, and remember the YAGNI principle...),
> keeping them in the base mahout-math vector classes seems wrong.

You read my mind. I agree entirely and it simplifies things here a lot.
I'd also say this about labels, but first things first perhaps.


> If they weren't in the base class at all, we could have just
>
> == : ref check
> equals : check values, not impl

(I support not checking implementation, though we have to be cognizant
of the issues above. For example a NamedVector implementation later,
or any extension, would have to not consider a name as part of the
state for purposes of equality. Which could be a fine restriction for
this sort of class.)

If I can get any other +1s I'll prepare a patch.

Re: [jira] Commented: (MAHOUT-379) SequentialAccessSparseVector.equals does not agree with AbstractVector.equivalent

Posted by Jake Mannix <ja...@gmail.com>.
Ok, back on list with this then (Thanks Danny for reminding us to deal with
this perennial issue we have!)


On Wed, Apr 14, 2010 at 2:26 AM, Sean Owen (JIRA) <ji...@apache.org> wrote:

>
> Yeah let's take some time to get this right. At the moment I see four
> notions of equivalence in Vector (which is down from five!), and this seems
> like one too many:
>
> ==: of course
> equals(): compares values, names, not implementation
> equivalent(): compares values only
> strictEquivalence(): compares values, names, implementation
>

I like that summary, nice and concise.


> equals() ought to be strict-ish. Its current implementation is fine, though
> conventional wisdom is that it's better to only consider instances of the
> same class equals() in order to avoid transitivity problems. I think that's
> a valid concern here.
>

What is the transitivity problem?  If (a instanceof VClassA), (b instanceof
VClassB) and (c instanceof VClassC), if all three equals() methods compare
the same things (ie values, names, not implementation), then a.equals(b) &&
b.equals(c) ==> a.equals(c), right?


> Therefore I submit that equals() should be replaced with what
> strictEquivalence() does.
>
> (And then, of course, fix the underlying issue that was raised too!)
>

I would actually prefer ripping names out of the base vectors entirely.
 They should decorate the mathematical vector, but as their use is decidedly
non-mathematical and application specific (we basically don't use them at
all currently, it should be noted, and remember the YAGNI principle...),
keeping them in the base mahout-math vector classes seems wrong.

If they weren't in the base class at all, we could have just

== : ref check
equals : check values, not impl

  -jake