You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@mahout.apache.org by marco turchi <ma...@gmail.com> on 2011/07/20 15:28:21 UTC

Problem with method Plus in the Vector class

Dear All,
I have a strange behaviour when I use the method Plus for Vector.

I have a RandomAccessSparseVector vector, if I add a positive number, I got
a new Vector where each element is the sum of the old value plus the
positive number. While if I add a negative number, the new vector has 1 more
entry:


RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
distV.setQuick(0,1);
double mean = 1;
RandomAccessSparseVector app =
(RandomAccessSparseVector)(distV.plus(mean));

the output is
{0:2.0}

if I have
double mean = -1;
RandomAccessSparseVector app =
(RandomAccessSparseVector)(distV.plus(mean));

the output is
{1:1.0,0:-1.0}

For sure I'm doing something wrong. Do you have any ideas where the problem
is?

Thanks a lot in advance
Marco

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

Nah... just the kind of blindness that keeps me from seeing the blueberries
on the second shelf.

Happens all the time in my world.

On Wed, Jul 20, 2011 at 10:25 AM, Benson Margulies <bi...@gmail.com>wrote:

> > My strong expectation is that this is a case of refrigerator blindness.
>
> Small scale snow-blindness?

Re: Problem with method Plus in the Vector class

Posted by Benson Margulies <bi...@gmail.com>.

>
> My strong expectation is that this is a case of refrigerator blindness.

Small scale snow-blindness?

>
> On Wed, Jul 20, 2011 at 9:22 AM, Marco Turchi <ma...@gmail.com>wrote:
>
>> Hi
>> you are right about the sparse vector issue... but I'm constructing distV
>> in the same way changing only + and - in the variable mean. In both the
>> cases, I should have the same number of entries in the final vector.
>>
>> Thanks a lot for your help
>> Marco
>>
>>
>> On 20 Jul 2011, at 17:42, Ted Dunning wrote:
>>
>>  You constructed the first vector with a dimension of 1.  It looks like you
>>> constructed the second one with a larger dimension of 2.
>>>
>>> When you offset a sparse vector, all of the zeros become non-zero and the
>>> vector becomes dense.  This results in a bunch of cells being created.
>>>
>>> On Wed, Jul 20, 2011 at 6:28 AM, marco turchi <marco.turchi@gmail.com
>>> >wrote:
>>>
>>>  Dear All,
>>>> I have a strange behaviour when I use the method Plus for Vector.
>>>>
>>>> I have a RandomAccessSparseVector vector, if I add a positive number, I
>>>> got
>>>> a new Vector where each element is the sum of the old value plus the
>>>> positive number. While if I add a negative number, the new vector has 1
>>>> more
>>>> entry:
>>>>
>>>>
>>>> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
>>>> distV.setQuick(0,1);
>>>> double mean = 1;
>>>> RandomAccessSparseVector app =
>>>> (RandomAccessSparseVector)(**distV.plus(mean));
>>>>
>>>> the output is
>>>> {0:2.0}
>>>>
>>>> if I have
>>>> double mean = -1;
>>>> RandomAccessSparseVector app =
>>>> (RandomAccessSparseVector)(**distV.plus(mean));
>>>>
>>>> the output is
>>>> {1:1.0,0:-1.0}
>>>>
>>>> For sure I'm doing something wrong. Do you have any ideas where the
>>>> problem
>>>> is?
>>>>
>>>> Thanks a lot in advance
>>>> Marco
>>>>
>>>>
>>
>

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

Weighted cosine distance with selected interaction components, to be
pedantic.

On Wed, Jul 20, 2011 at 2:59 PM, Marco Turchi <ma...@gmail.com>wrote:

> Sorry I completely misunderstood, I guess you were talking about the
> weighted cosine distance.
>
> Great, I'll try.
>
> Thanks again for your useful suggestions
> Marco
>
>
> On 20 Jul 2011, at 23:38, Ted Dunning wrote:
>
>  Actually, I would suggest weighting words by something like tf-idf
>> weighting.
>>
>> http://en.wikipedia.org/wiki/**Tf%E2%80%93idf<http://en.wikipedia.org/wiki/Tf%E2%80%93idf>
>>
>> log or sqrt(tf) is often good instead of linear tf.  The standard
>> log((N+1)
>> / (df+1)) definition is usually good.
>>
>> On Wed, Jul 20, 2011 at 2:29 PM, Marco Turchi <marco.turchi@gmail.com
>> >wrote:
>>
>>  Whao, thanks a lot, it seems very interesting. What you suggested means
>>> to
>>> weight each single words differently when I apply the cosine similarity.
>>> Each weight is the frequency of the word in the seed documents. It is not
>>> clear to me how to compute and use the anomalously common cooccurrences,
>>> but
>>> I'll investigate.
>>>
>>> Thanks a lot
>>> Marco
>>>
>>>
>>>
>>> On 20 Jul 2011, at 20:36, Ted Dunning wrote:
>>>
>>> frequency weighted cosine distance
>>>
>>>>
>>>>
>>>
>>>
>

Re: Problem with method Plus in the Vector class

Posted by Marco Turchi <ma...@gmail.com>.

Sorry I completely misunderstood, I guess you were talking about the  
weighted cosine distance.

Great, I'll try.

Thanks again for your useful suggestions
Marco

On 20 Jul 2011, at 23:38, Ted Dunning wrote:

> Actually, I would suggest weighting words by something like tf-idf
> weighting.
>
> http://en.wikipedia.org/wiki/Tf%E2%80%93idf
>
> log or sqrt(tf) is often good instead of linear tf.  The standard  
> log((N+1)
> / (df+1)) definition is usually good.
>
> On Wed, Jul 20, 2011 at 2:29 PM, Marco Turchi  
> <ma...@gmail.com>wrote:
>
>> Whao, thanks a lot, it seems very interesting. What you suggested  
>> means to
>> weight each single words differently when I apply the cosine  
>> similarity.
>> Each weight is the frequency of the word in the seed documents. It  
>> is not
>> clear to me how to compute and use the anomalously common  
>> cooccurrences, but
>> I'll investigate.
>>
>> Thanks a lot
>> Marco
>>
>>
>>
>> On 20 Jul 2011, at 20:36, Ted Dunning wrote:
>>
>> frequency weighted cosine distance
>>>
>>
>>

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

Actually, I would suggest weighting words by something like tf-idf
weighting.

http://en.wikipedia.org/wiki/Tf%E2%80%93idf

log or sqrt(tf) is often good instead of linear tf.  The standard log((N+1)
/ (df+1)) definition is usually good.

On Wed, Jul 20, 2011 at 2:29 PM, Marco Turchi <ma...@gmail.com>wrote:

> Whao, thanks a lot, it seems very interesting. What you suggested means to
> weight each single words differently when I apply the cosine similarity.
> Each weight is the frequency of the word in the seed documents. It is not
> clear to me how to compute and use the anomalously common cooccurrences, but
> I'll investigate.
>
> Thanks a lot
> Marco
>
>
>
> On 20 Jul 2011, at 20:36, Ted Dunning wrote:
>
>  frequency weighted cosine distance
>>
>
>

Re: Problem with method Plus in the Vector class

Posted by Marco Turchi <ma...@gmail.com>.

Whao, thanks a lot, it seems very interesting. What you suggested  
means to weight each single words differently when I apply the cosine  
similarity. Each weight is the frequency of the word in the seed  
documents. It is not clear to me how to compute and use the  
anomalously common cooccurrences, but I'll investigate.

Thanks a lot
Marco



On 20 Jul 2011, at 20:36, Ted Dunning wrote:

> frequency weighted cosine distance

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

Just use a frequency weighted cosine distance and index words and
anomalously common cooccurrences.  That gives you pretty much all you are
asking for.

Also, your progressive increase approach sounds a lot like k-means.  You
might take a look to see if that could help.

On Wed, Jul 20, 2011 at 11:33 AM, Marco Turchi <ma...@gmail.com>wrote:

> The problem of using the cosine similarity being an Euclidean distance, if
> I'm not wrong, is that it groups the document inside a sphere, so I was
> wondering to use some the Mahalonobis distance, because it allows me to
> group the document inside an ellipsoid.  This means to compute the
> covariance matrix of the seed documents.
> I looked around searching for other approaches, but at the moment I'm
> stalled on the Mahalonobis distance and its covariance matrix.
> Any suggestions are very welcome
>

Re: Problem with method Plus in the Vector class

Posted by Marco Turchi <ma...@gmail.com>.

Thanks for the useful answer.

I agree with you about the usefulness of the Mahalanobis distance. My  
problem is an outlier detection problem. I have a set of documents and  
I want to discard the less informative or those that are "distant"  
from the core information of the set.

I'm implementing a search approach that iteratively selects the most  
promising documents. Starting from a random seed of documents, it  
tries to model the information that is in the seed and uses this model  
to find the next most similar document.  At the moment, I'm modeling  
the information inside the seed documents computing the centroid, and  
then I compute the cosine similarity between the centroid and all the  
other documents. The most similar is added, the centroid is recomputed  
and so on till the stop criteria is not satisfied.

The problem of using the cosine similarity being an Euclidean  
distance, if I'm not wrong, is that it groups the document inside a  
sphere, so I was wondering to use some the Mahalonobis distance,  
because it allows me to group the document inside an ellipsoid.  This  
means to compute the covariance matrix of the seed documents.
I looked around searching for other approaches, but at the moment I'm  
stalled on the Mahalonobis distance and its covariance matrix.
Any suggestions are very welcome

Thanks a lot
Marco



On 20 Jul 2011, at 19:52, Ted Dunning wrote:

> Mahalanobis distance as defined really depends on good estimates of
> covariance and that is a pretty dense computation.
>
> Generally, you need to define some restricted kind of covariance or  
> a strong
> prior distribution on covariance in order to get a reasonable  
> estimate in
> the kind of ill-posed problem that you are suggesting.  One  
> possibility is
> to assume that covariance is sparse with only a few non-diagonal  
> terms.  If
> you have some kind of heuristic for deciding where you will allow  
> non-zeros,
> then you can build an estimation technique.  Not surprisingly, I  
> recommend
> LLR for finding interesting points.
>
> Another option is to look at rank limited approximations like SVD  
> which are
> not dense, but do require less storage than fully dense  
> representations.
>
> Both of these can be viewed as decompositions, but this approach  
> probably
> only really helps with the SVD approach.
>
> If I were working on this problem, though, I would strongly question  
> the
> need for Mahalanobis distance in the first place.  It can be lovely,  
> but it
> really comes from an earlier time and I question the applicability  
> to large
> sparse matrices of counts.
>
> On Wed, Jul 20, 2011 at 10:28 AM, Marco Turchi  
> <ma...@gmail.com>wrote:
>
>> ...
>> I have the last question:
>> I need to compute the covariance matrix of some input data to  
>> compute the
>> Mahalanobis distance. In my data,  the number of features is bigger  
>> than the
>> number of docs, and these data are very sparse. I cannot compute the
>> covariance matrix using the classic approach, because it becomes  
>> very time
>> and computational expensive. Is there any implementation of an  
>> approximate
>> way to compute it in Mahout? I have had a look in the library, but  
>> I do not
>> find it.
>>
>> thanks for your help
>> Marco
>>
>>
>> On 20 Jul 2011, at 19:15, Ted Dunning wrote:
>>
>> Well, that is hard to understand without a complete test case.  In  
>> the
>>> first
>>> case you have a vector with one element constructed while in the  
>>> second
>>> case, you wind up with two elements and don't show the constructor.
>>>
>>> Write up a JIRA and attach a unit test.
>>>
>>> My strong expectation is that this is a case of refrigerator  
>>> blindness.
>>>
>>> On Wed, Jul 20, 2011 at 9:22 AM, Marco Turchi  
>>> <marco.turchi@gmail.com
>>>> wrote:
>>>
>>> Hi
>>>> you are right about the sparse vector issue... but I'm  
>>>> constructing distV
>>>> in the same way changing only + and - in the variable mean. In  
>>>> both the
>>>> cases, I should have the same number of entries in the final  
>>>> vector.
>>>>
>>>> Thanks a lot for your help
>>>> Marco
>>>>
>>>>
>>>> On 20 Jul 2011, at 17:42, Ted Dunning wrote:
>>>>
>>>> You constructed the first vector with a dimension of 1.  It looks  
>>>> like
>>>> you
>>>>
>>>>> constructed the second one with a larger dimension of 2.
>>>>>
>>>>> When you offset a sparse vector, all of the zeros become non- 
>>>>> zero and
>>>>> the
>>>>> vector becomes dense.  This results in a bunch of cells being  
>>>>> created.
>>>>>
>>>>> On Wed, Jul 20, 2011 at 6:28 AM, marco turchi <marco.turchi@gmail.com
>>>>>
>>>>>> wrote:
>>>>>>
>>>>>
>>>>> Dear All,
>>>>>
>>>>>> I have a strange behaviour when I use the method Plus for Vector.
>>>>>>
>>>>>> I have a RandomAccessSparseVector vector, if I add a positive  
>>>>>> number, I
>>>>>> got
>>>>>> a new Vector where each element is the sum of the old value  
>>>>>> plus the
>>>>>> positive number. While if I add a negative number, the new  
>>>>>> vector has 1
>>>>>> more
>>>>>> entry:
>>>>>>
>>>>>>
>>>>>> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
>>>>>> distV.setQuick(0,1);
>>>>>> double mean = 1;
>>>>>> RandomAccessSparseVector app =
>>>>>> (RandomAccessSparseVector)(****distV.plus(mean));
>>>>>>
>>>>>> the output is
>>>>>> {0:2.0}
>>>>>>
>>>>>> if I have
>>>>>> double mean = -1;
>>>>>> RandomAccessSparseVector app =
>>>>>> (RandomAccessSparseVector)(****distV.plus(mean));
>>>>>>
>>>>>> the output is
>>>>>> {1:1.0,0:-1.0}
>>>>>>
>>>>>> For sure I'm doing something wrong. Do you have any ideas where  
>>>>>> the
>>>>>> problem
>>>>>> is?
>>>>>>
>>>>>> Thanks a lot in advance
>>>>>> Marco
>>>>>>
>>>>>>
>>>>>>
>>>>
>>

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

Mahalanobis distance as defined really depends on good estimates of
covariance and that is a pretty dense computation.

Generally, you need to define some restricted kind of covariance or a strong
prior distribution on covariance in order to get a reasonable estimate in
the kind of ill-posed problem that you are suggesting.  One possibility is
to assume that covariance is sparse with only a few non-diagonal terms.  If
you have some kind of heuristic for deciding where you will allow non-zeros,
then you can build an estimation technique.  Not surprisingly, I recommend
LLR for finding interesting points.

Another option is to look at rank limited approximations like SVD which are
not dense, but do require less storage than fully dense representations.

Both of these can be viewed as decompositions, but this approach probably
only really helps with the SVD approach.

If I were working on this problem, though, I would strongly question the
need for Mahalanobis distance in the first place.  It can be lovely, but it
really comes from an earlier time and I question the applicability to large
sparse matrices of counts.

On Wed, Jul 20, 2011 at 10:28 AM, Marco Turchi <ma...@gmail.com>wrote:

> ...
> I have the last question:
> I need to compute the covariance matrix of some input data to compute the
> Mahalanobis distance. In my data,  the number of features is bigger than the
> number of docs, and these data are very sparse. I cannot compute the
> covariance matrix using the classic approach, because it becomes very time
> and computational expensive. Is there any implementation of an approximate
> way to compute it in Mahout? I have had a look in the library, but I do not
> find it.
>
> thanks for your help
> Marco
>
>
> On 20 Jul 2011, at 19:15, Ted Dunning wrote:
>
>  Well, that is hard to understand without a complete test case.  In the
>> first
>> case you have a vector with one element constructed while in the second
>> case, you wind up with two elements and don't show the constructor.
>>
>> Write up a JIRA and attach a unit test.
>>
>> My strong expectation is that this is a case of refrigerator blindness.
>>
>> On Wed, Jul 20, 2011 at 9:22 AM, Marco Turchi <marco.turchi@gmail.com
>> >wrote:
>>
>>  Hi
>>> you are right about the sparse vector issue... but I'm constructing distV
>>> in the same way changing only + and - in the variable mean. In both the
>>> cases, I should have the same number of entries in the final vector.
>>>
>>> Thanks a lot for your help
>>> Marco
>>>
>>>
>>> On 20 Jul 2011, at 17:42, Ted Dunning wrote:
>>>
>>> You constructed the first vector with a dimension of 1.  It looks like
>>> you
>>>
>>>> constructed the second one with a larger dimension of 2.
>>>>
>>>> When you offset a sparse vector, all of the zeros become non-zero and
>>>> the
>>>> vector becomes dense.  This results in a bunch of cells being created.
>>>>
>>>> On Wed, Jul 20, 2011 at 6:28 AM, marco turchi <marco.turchi@gmail.com
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> Dear All,
>>>>
>>>>> I have a strange behaviour when I use the method Plus for Vector.
>>>>>
>>>>> I have a RandomAccessSparseVector vector, if I add a positive number, I
>>>>> got
>>>>> a new Vector where each element is the sum of the old value plus the
>>>>> positive number. While if I add a negative number, the new vector has 1
>>>>> more
>>>>> entry:
>>>>>
>>>>>
>>>>> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
>>>>> distV.setQuick(0,1);
>>>>> double mean = 1;
>>>>> RandomAccessSparseVector app =
>>>>> (RandomAccessSparseVector)(****distV.plus(mean));
>>>>>
>>>>> the output is
>>>>> {0:2.0}
>>>>>
>>>>> if I have
>>>>> double mean = -1;
>>>>> RandomAccessSparseVector app =
>>>>> (RandomAccessSparseVector)(****distV.plus(mean));
>>>>>
>>>>> the output is
>>>>> {1:1.0,0:-1.0}
>>>>>
>>>>> For sure I'm doing something wrong. Do you have any ideas where the
>>>>> problem
>>>>> is?
>>>>>
>>>>> Thanks a lot in advance
>>>>> Marco
>>>>>
>>>>>
>>>>>
>>>
>

Re: Problem with method Plus in the Vector class

Posted by Marco Turchi <ma...@gmail.com>.

Great!!
Thanks
Marco
On 20 Jul 2011, at 19:43, Ted Dunning wrote:

> See here: http://issues.apache.org/jira/browse/MAHOUT
>
> On Wed, Jul 20, 2011 at 10:28 AM, Marco Turchi  
> <ma...@gmail.com>wrote:
>
>> Ok, I need to understand what a JIRA is then I'll attach a unit  
>> test ;-)
>>

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

See here: http://issues.apache.org/jira/browse/MAHOUT

On Wed, Jul 20, 2011 at 10:28 AM, Marco Turchi <ma...@gmail.com>wrote:

> Ok, I need to understand what a JIRA is then I'll attach a unit test ;-)
>

Re: Problem with method Plus in the Vector class

Posted by Marco Turchi <ma...@gmail.com>.

Hi
the constructor is the same also in the second case. I run two times  
the same code, one using mean = 1 and in the second mean = -1.

Ok, I need to understand what a JIRA is then I'll attach a unit test ;-)

I have the last question:
I need to compute the covariance matrix of some input data to compute  
the Mahalanobis distance. In my data,  the number of features is  
bigger than the number of docs, and these data are very sparse. I  
cannot compute the covariance matrix using the classic approach,  
because it becomes very time and computational expensive. Is there any  
implementation of an approximate way to compute it in Mahout? I have  
had a look in the library, but I do not find it.

thanks for your help
Marco

On 20 Jul 2011, at 19:15, Ted Dunning wrote:

> Well, that is hard to understand without a complete test case.  In  
> the first
> case you have a vector with one element constructed while in the  
> second
> case, you wind up with two elements and don't show the constructor.
>
> Write up a JIRA and attach a unit test.
>
> My strong expectation is that this is a case of refrigerator  
> blindness.
>
> On Wed, Jul 20, 2011 at 9:22 AM, Marco Turchi  
> <ma...@gmail.com>wrote:
>
>> Hi
>> you are right about the sparse vector issue... but I'm constructing  
>> distV
>> in the same way changing only + and - in the variable mean. In both  
>> the
>> cases, I should have the same number of entries in the final vector.
>>
>> Thanks a lot for your help
>> Marco
>>
>>
>> On 20 Jul 2011, at 17:42, Ted Dunning wrote:
>>
>> You constructed the first vector with a dimension of 1.  It looks  
>> like you
>>> constructed the second one with a larger dimension of 2.
>>>
>>> When you offset a sparse vector, all of the zeros become non-zero  
>>> and the
>>> vector becomes dense.  This results in a bunch of cells being  
>>> created.
>>>
>>> On Wed, Jul 20, 2011 at 6:28 AM, marco turchi  
>>> <marco.turchi@gmail.com
>>>> wrote:
>>>
>>> Dear All,
>>>> I have a strange behaviour when I use the method Plus for Vector.
>>>>
>>>> I have a RandomAccessSparseVector vector, if I add a positive  
>>>> number, I
>>>> got
>>>> a new Vector where each element is the sum of the old value plus  
>>>> the
>>>> positive number. While if I add a negative number, the new vector  
>>>> has 1
>>>> more
>>>> entry:
>>>>
>>>>
>>>> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
>>>> distV.setQuick(0,1);
>>>> double mean = 1;
>>>> RandomAccessSparseVector app =
>>>> (RandomAccessSparseVector)(**distV.plus(mean));
>>>>
>>>> the output is
>>>> {0:2.0}
>>>>
>>>> if I have
>>>> double mean = -1;
>>>> RandomAccessSparseVector app =
>>>> (RandomAccessSparseVector)(**distV.plus(mean));
>>>>
>>>> the output is
>>>> {1:1.0,0:-1.0}
>>>>
>>>> For sure I'm doing something wrong. Do you have any ideas where the
>>>> problem
>>>> is?
>>>>
>>>> Thanks a lot in advance
>>>> Marco
>>>>
>>>>
>>

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

Well, that is hard to understand without a complete test case.  In the first
case you have a vector with one element constructed while in the second
case, you wind up with two elements and don't show the constructor.

Write up a JIRA and attach a unit test.

My strong expectation is that this is a case of refrigerator blindness.

On Wed, Jul 20, 2011 at 9:22 AM, Marco Turchi <ma...@gmail.com>wrote:

> Hi
> you are right about the sparse vector issue... but I'm constructing distV
> in the same way changing only + and - in the variable mean. In both the
> cases, I should have the same number of entries in the final vector.
>
> Thanks a lot for your help
> Marco
>
>
> On 20 Jul 2011, at 17:42, Ted Dunning wrote:
>
>  You constructed the first vector with a dimension of 1.  It looks like you
>> constructed the second one with a larger dimension of 2.
>>
>> When you offset a sparse vector, all of the zeros become non-zero and the
>> vector becomes dense.  This results in a bunch of cells being created.
>>
>> On Wed, Jul 20, 2011 at 6:28 AM, marco turchi <marco.turchi@gmail.com
>> >wrote:
>>
>>  Dear All,
>>> I have a strange behaviour when I use the method Plus for Vector.
>>>
>>> I have a RandomAccessSparseVector vector, if I add a positive number, I
>>> got
>>> a new Vector where each element is the sum of the old value plus the
>>> positive number. While if I add a negative number, the new vector has 1
>>> more
>>> entry:
>>>
>>>
>>> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
>>> distV.setQuick(0,1);
>>> double mean = 1;
>>> RandomAccessSparseVector app =
>>> (RandomAccessSparseVector)(**distV.plus(mean));
>>>
>>> the output is
>>> {0:2.0}
>>>
>>> if I have
>>> double mean = -1;
>>> RandomAccessSparseVector app =
>>> (RandomAccessSparseVector)(**distV.plus(mean));
>>>
>>> the output is
>>> {1:1.0,0:-1.0}
>>>
>>> For sure I'm doing something wrong. Do you have any ideas where the
>>> problem
>>> is?
>>>
>>> Thanks a lot in advance
>>> Marco
>>>
>>>
>

Re: Problem with method Plus in the Vector class

Posted by Marco Turchi <ma...@gmail.com>.

Hi
you are right about the sparse vector issue... but I'm constructing  
distV in the same way changing only + and - in the variable mean. In  
both the cases, I should have the same number of entries in the final  
vector.

Thanks a lot for your help
Marco

On 20 Jul 2011, at 17:42, Ted Dunning wrote:

> You constructed the first vector with a dimension of 1.  It looks  
> like you
> constructed the second one with a larger dimension of 2.
>
> When you offset a sparse vector, all of the zeros become non-zero  
> and the
> vector becomes dense.  This results in a bunch of cells being created.
>
> On Wed, Jul 20, 2011 at 6:28 AM, marco turchi  
> <ma...@gmail.com>wrote:
>
>> Dear All,
>> I have a strange behaviour when I use the method Plus for Vector.
>>
>> I have a RandomAccessSparseVector vector, if I add a positive  
>> number, I got
>> a new Vector where each element is the sum of the old value plus the
>> positive number. While if I add a negative number, the new vector  
>> has 1
>> more
>> entry:
>>
>>
>> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
>> distV.setQuick(0,1);
>> double mean = 1;
>> RandomAccessSparseVector app =
>> (RandomAccessSparseVector)(distV.plus(mean));
>>
>> the output is
>> {0:2.0}
>>
>> if I have
>> double mean = -1;
>> RandomAccessSparseVector app =
>> (RandomAccessSparseVector)(distV.plus(mean));
>>
>> the output is
>> {1:1.0,0:-1.0}
>>
>> For sure I'm doing something wrong. Do you have any ideas where the  
>> problem
>> is?
>>
>> Thanks a lot in advance
>> Marco
>>

Re: Problem with method Plus in the Vector class

Posted by Ted Dunning <te...@gmail.com>.

You constructed the first vector with a dimension of 1.  It looks like you
constructed the second one with a larger dimension of 2.

When you offset a sparse vector, all of the zeros become non-zero and the
vector becomes dense.  This results in a bunch of cells being created.

On Wed, Jul 20, 2011 at 6:28 AM, marco turchi <ma...@gmail.com>wrote:

> Dear All,
> I have a strange behaviour when I use the method Plus for Vector.
>
> I have a RandomAccessSparseVector vector, if I add a positive number, I got
> a new Vector where each element is the sum of the old value plus the
> positive number. While if I add a negative number, the new vector has 1
> more
> entry:
>
>
> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
> distV.setQuick(0,1);
> double mean = 1;
> RandomAccessSparseVector app =
> (RandomAccessSparseVector)(distV.plus(mean));
>
> the output is
> {0:2.0}
>
> if I have
> double mean = -1;
> RandomAccessSparseVector app =
> (RandomAccessSparseVector)(distV.plus(mean));
>
> the output is
> {1:1.0,0:-1.0}
>
> For sure I'm doing something wrong. Do you have any ideas where the problem
> is?
>
> Thanks a lot in advance
> Marco
>

Re: Problem with method Plus in the Vector class

Posted by marco turchi <ma...@gmail.com>.

Dear Sean,
thanks a a lot, I'll update the jar

Thanks
Marco

On Thu, Jul 21, 2011 at 1:06 PM, Sean Owen <sr...@gmail.com> wrote:

> Going waaay back to the original question -- I can't reproduce this in the
> latest code.
>
> The result ought to be "{}" since RandomAccessSparseVector will only print
> entries that have a value set and are not defaulted to 0.0. And it's smart
> enough in this case to remove the entry you have set since its value goes
> back to 0.0.
>
> But, the output is "{". There's a bug in the toString() method, and I would
> not be surprised if it's my fat fingers that made it. I will correct the
> method.
>
> Sean
>
> On Wed, Jul 20, 2011 at 2:28 PM, marco turchi <marco.turchi@gmail.com
> >wrote:
>
> > Dear All,
> > I have a strange behaviour when I use the method Plus for Vector.
> >
> > I have a RandomAccessSparseVector vector, if I add a positive number, I
> got
> > a new Vector where each element is the sum of the old value plus the
> > positive number. While if I add a negative number, the new vector has 1
> > more
> > entry:
> >
> >
> > RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
> > distV.setQuick(0,1);
> > double mean = 1;
> > RandomAccessSparseVector app =
> > (RandomAccessSparseVector)(distV.plus(mean));
> >
> > the output is
> > {0:2.0}
> >
> > if I have
> > double mean = -1;
> > RandomAccessSparseVector app =
> > (RandomAccessSparseVector)(distV.plus(mean));
> >
> > the output is
> > {1:1.0,0:-1.0}
> >
> > For sure I'm doing something wrong. Do you have any ideas where the
> problem
> > is?
> >
> > Thanks a lot in advance
> > Marco
> >
>

Re: Problem with method Plus in the Vector class

Posted by Sean Owen <sr...@gmail.com>.

Going waaay back to the original question -- I can't reproduce this in the
latest code.

The result ought to be "{}" since RandomAccessSparseVector will only print
entries that have a value set and are not defaulted to 0.0. And it's smart
enough in this case to remove the entry you have set since its value goes
back to 0.0.

But, the output is "{". There's a bug in the toString() method, and I would
not be surprised if it's my fat fingers that made it. I will correct the
method.

Sean

On Wed, Jul 20, 2011 at 2:28 PM, marco turchi <ma...@gmail.com>wrote:

> Dear All,
> I have a strange behaviour when I use the method Plus for Vector.
>
> I have a RandomAccessSparseVector vector, if I add a positive number, I got
> a new Vector where each element is the sum of the old value plus the
> positive number. While if I add a negative number, the new vector has 1
> more
> entry:
>
>
> RandomAccessSparseVector distV = new RandomAccessSparseVector(1);
> distV.setQuick(0,1);
> double mean = 1;
> RandomAccessSparseVector app =
> (RandomAccessSparseVector)(distV.plus(mean));
>
> the output is
> {0:2.0}
>
> if I have
> double mean = -1;
> RandomAccessSparseVector app =
> (RandomAccessSparseVector)(distV.plus(mean));
>
> the output is
> {1:1.0,0:-1.0}
>
> For sure I'm doing something wrong. Do you have any ideas where the problem
> is?
>
> Thanks a lot in advance
> Marco
>