You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Lance Norskog <go...@gmail.com> on 2010/11/17 08:47:01 UTC

About arithmetic on Vectors

The Mahout Vector implementations of arithmetic have what I think is a bug.

class AbstractVector
addTo(Vector v) {
     Iterator<Element> it = iterateNonZero();
     while (it.hasNext()) {
       Element e = it.next();
       int index = e.index();
       v.setQuick(index, v.getQuick(index) + e.get());
     }
   }

Because "this" walks only its non-zero elements, matching columns in the 
other vector are ignored. That is:

[1, 2, 0, 4].addTo([1, 1, 1, 1]) = [2,  3, 0, 5]

public Vector plus(Vector) also does this at around line 371:

     Vector result = like().assign(this);
     Iterator<Element> iter = x.iterateNonZero();
     while (iter.hasNext()) {
       Element e = iter.next();
       int index = e.index();
       result.setQuick(index, this.getQuick(index) + e.get());
     }
     return result;


All of the Vector subclasses that store data (Dense, RandomAccess, 
SequentialAccess) don't override these two methods.
The unit tests don't catch this mistake- they need a wider range of test 
data. A lot of code uses these two methods, and are getting bogus 
results. Because Maven is weird for me, I can't run the test suites on a 
fixed version.

Lance Norskog

.















Re: About arithmetic on Vectors

Posted by Lance Norskog <go...@gmail.com>.
Argh! Yes, I get it now.


On Thu, Nov 18, 2010 at 7:16 AM, Ted Dunning <te...@gmail.com> wrote:
> Also, if you have suggested test values, we can easily add them to the
> vector test prototype.
>
> On Thu, Nov 18, 2010 at 6:21 AM, Grant Ingersoll <gs...@apache.org>wrote:
>
>> > All of the Vector subclasses that store data (Dense, RandomAccess,
>> SequentialAccess) don't override these two methods.
>> > The unit tests don't catch this mistake- they need a wider range of test
>> data. A lot of code uses these two methods, and are getting bogus results.
>> Because Maven is weird for me, I can't run the test suites on a fixed
>> version.
>>
>> What commands are you running?
>



-- 
Lance Norskog
goksron@gmail.com

Re: About arithmetic on Vectors

Posted by Ted Dunning <te...@gmail.com>.
Also, if you have suggested test values, we can easily add them to the
vector test prototype.

On Thu, Nov 18, 2010 at 6:21 AM, Grant Ingersoll <gs...@apache.org>wrote:

> > All of the Vector subclasses that store data (Dense, RandomAccess,
> SequentialAccess) don't override these two methods.
> > The unit tests don't catch this mistake- they need a wider range of test
> data. A lot of code uses these two methods, and are getting bogus results.
> Because Maven is weird for me, I can't run the test suites on a fixed
> version.
>
> What commands are you running?

Re: About arithmetic on Vectors

Posted by Grant Ingersoll <gs...@apache.org>.
On Nov 17, 2010, at 2:47 AM, Lance Norskog wrote:

> The Mahout Vector implementations of arithmetic have what I think is a bug.
> 
> class AbstractVector
> addTo(Vector v) {
>    Iterator<Element> it = iterateNonZero();
>    while (it.hasNext()) {
>      Element e = it.next();
>      int index = e.index();
>      v.setQuick(index, v.getQuick(index) + e.get());
>    }
>  }
> 
> Because "this" walks only its non-zero elements, matching columns in the other vector are ignored. That is:
> 
> [1, 2, 0, 4].addTo([1, 1, 1, 1]) = [2,  3, 0, 5]

I think the above code is correct, it is getting the non-zero values from the "this" vector and adding them to the vector passed in.   In other words, it's adding this vector to the vector V, whereas I think you are implying it is the other way around.

The above should actually be:
> [1, 2, 0, 4].addTo([1, 1, 1, 1]) = [2,  3, 1, 5]


As the Javadoc says:
/** Add the elements to the other vector and results are stored in that vector. */

I added the following test:
@Test
  public void testAddTo() throws Exception {
    Vector v = new DenseVector(4);
    Vector w = new DenseVector(4);
    v.setQuick(0, 1);
    v.setQuick(1, 2);
    v.setQuick(2, 0);
    v.setQuick(3, 4);

    w.setQuick(0, 1);
    w.setQuick(1, 1);
    w.setQuick(2, 1);
    w.setQuick(3, 1);

    v.addTo(w);
    Vector gold = new DenseVector(new double[]{2, 3, 1, 5});
    assertTrue(w.equals(gold));
    assertFalse(v.equals(gold));
  }

I also clarified the docs.

> 
> 
> All of the Vector subclasses that store data (Dense, RandomAccess, SequentialAccess) don't override these two methods.
> The unit tests don't catch this mistake- they need a wider range of test data. A lot of code uses these two methods, and are getting bogus results. Because Maven is weird for me, I can't run the test suites on a fixed version.

What commands are you running?