You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hama.apache.org by Yexi Jiang <yx...@apache.org> on 2014/02/24 22:14:59 UTC

Implementation of DoubleVector/DenseDoubleVector/SparseDoubleVector

Hi, All,

I am currently working on the SparseDoubleVector (HAMA-863) and found some
unclear places about the vector implementation.

1. What is the definition for a vector? According to the implementation, it
is implemented as elementwise sqrt. In such a case, problem will occur if
the one of the entry is negative.

2. Most of the operators are conducted on a copy of the current object. Do
we also need to provide a set of operators that directly modify the current
object itself? e.g. addOriginal, subtractOriginal, etc.

3. When a DenseDoubleVector operates with a SparseDoubleVector, what will
be the concrete type of the result object? A simple implementation is to
always return a SparseDoubleVector, even if it is dense. A complex
implementation is we maintain a sparsity ratio (the ratio of non-default
entries), if the ratio exceed a threshold, a DenseDoubleVector will be
returned.

4. Is the toArray method available for SparseDoubleVector? In my opinion,
it is better not to do that.


Regards,
Yexi

Re: Implementation of DoubleVector/DenseDoubleVector/SparseDoubleVector

Posted by "Edward J. Yoon" <ed...@apache.org>.

Yeah, array is memory inefficient. We might want to use map for sparse
array but anyway I think we have to keep the consistency between
implementations.



On Tue, Mar 4, 2014 at 12:50 AM, Yexi Jiang <ye...@gmail.com> wrote:
>> I think SparseDoubleVector can be represented by index/value pairs.
>> And, toArray can be implemented like:
>
>> public double[] toArray() {
>>  double[] arr = new double[size];
>> for(Element e : vector) {
>>    arr[e.getIndex()] = e.getValue();
>>  }
>>}
>
>
> The above piece of code would still waste a lot of space, supposing the
> dimension (size) is 10^6 and only a couple of entries are set. This piece
> of code would still return an array with one million entires.
>
> If we do not care about the space cost, this method is fine.
>
>
>
>
>
> 2014-03-03 0:44 GMT-05:00 Edward J. Yoon <ed...@apache.org>:
>
>> > 4. Is the toArray method available for SparseDoubleVector? In my opinion,
>> > it is better not to do that.
>>
>> I think SparseDoubleVector can be represented by index/value pairs.
>> And, toArray can be implemented like:
>>
>> public double[] toArray() {
>>   double[] arr = new double[size];
>>   for(Element e : vector) {
>>     arr[e.getIndex()] = e.getValue();
>>   }
>> }
>>
>> For bit vector, java.util.BitSet can be used.
>>
>> On Tue, Feb 25, 2014 at 6:14 AM, Yexi Jiang <yx...@apache.org> wrote:
>> > Hi, All,
>> >
>> > I am currently working on the SparseDoubleVector (HAMA-863) and found
>> some
>> > unclear places about the vector implementation.
>> >
>> > 1. What is the definition for a vector? According to the implementation,
>> it
>> > is implemented as elementwise sqrt. In such a case, problem will occur if
>> > the one of the entry is negative.
>> >
>> > 2. Most of the operators are conducted on a copy of the current object.
>> Do
>> > we also need to provide a set of operators that directly modify the
>> current
>> > object itself? e.g. addOriginal, subtractOriginal, etc.
>> >
>> > 3. When a DenseDoubleVector operates with a SparseDoubleVector, what will
>> > be the concrete type of the result object? A simple implementation is to
>> > always return a SparseDoubleVector, even if it is dense. A complex
>> > implementation is we maintain a sparsity ratio (the ratio of non-default
>> > entries), if the ratio exceed a threshold, a DenseDoubleVector will be
>> > returned.
>> >
>> > 4. Is the toArray method available for SparseDoubleVector? In my opinion,
>> > it is better not to do that.
>> >
>> >
>> > Regards,
>> > Yexi
>>
>>
>>
>> --
>> Edward J. Yoon (@eddieyoon)
>> Chief Executive Officer
>> DataSayer, Inc.
>>
>
>
>
> --
> ------
> Yexi Jiang,
> ECS 251,  yjian004@cs.fiu.edu
> School of Computer and Information Science,
> Florida International University
> Homepage: http://users.cis.fiu.edu/~yjian004/



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.

Re: Implementation of DoubleVector/DenseDoubleVector/SparseDoubleVector

Posted by Yexi Jiang <ye...@gmail.com>.

> I think SparseDoubleVector can be represented by index/value pairs.
> And, toArray can be implemented like:

> public double[] toArray() {
>  double[] arr = new double[size];
> for(Element e : vector) {
>    arr[e.getIndex()] = e.getValue();
>  }
>}


The above piece of code would still waste a lot of space, supposing the
dimension (size) is 10^6 and only a couple of entries are set. This piece
of code would still return an array with one million entires.

If we do not care about the space cost, this method is fine.





2014-03-03 0:44 GMT-05:00 Edward J. Yoon <ed...@apache.org>:

> > 4. Is the toArray method available for SparseDoubleVector? In my opinion,
> > it is better not to do that.
>
> I think SparseDoubleVector can be represented by index/value pairs.
> And, toArray can be implemented like:
>
> public double[] toArray() {
>   double[] arr = new double[size];
>   for(Element e : vector) {
>     arr[e.getIndex()] = e.getValue();
>   }
> }
>
> For bit vector, java.util.BitSet can be used.
>
> On Tue, Feb 25, 2014 at 6:14 AM, Yexi Jiang <yx...@apache.org> wrote:
> > Hi, All,
> >
> > I am currently working on the SparseDoubleVector (HAMA-863) and found
> some
> > unclear places about the vector implementation.
> >
> > 1. What is the definition for a vector? According to the implementation,
> it
> > is implemented as elementwise sqrt. In such a case, problem will occur if
> > the one of the entry is negative.
> >
> > 2. Most of the operators are conducted on a copy of the current object.
> Do
> > we also need to provide a set of operators that directly modify the
> current
> > object itself? e.g. addOriginal, subtractOriginal, etc.
> >
> > 3. When a DenseDoubleVector operates with a SparseDoubleVector, what will
> > be the concrete type of the result object? A simple implementation is to
> > always return a SparseDoubleVector, even if it is dense. A complex
> > implementation is we maintain a sparsity ratio (the ratio of non-default
> > entries), if the ratio exceed a threshold, a DenseDoubleVector will be
> > returned.
> >
> > 4. Is the toArray method available for SparseDoubleVector? In my opinion,
> > it is better not to do that.
> >
> >
> > Regards,
> > Yexi
>
>
>
> --
> Edward J. Yoon (@eddieyoon)
> Chief Executive Officer
> DataSayer, Inc.
>



-- 
------
Yexi Jiang,
ECS 251,  yjian004@cs.fiu.edu
School of Computer and Information Science,
Florida International University
Homepage: http://users.cis.fiu.edu/~yjian004/

Re: Implementation of DoubleVector/DenseDoubleVector/SparseDoubleVector

Posted by "Edward J. Yoon" <ed...@apache.org>.

> 4. Is the toArray method available for SparseDoubleVector? In my opinion,
> it is better not to do that.

I think SparseDoubleVector can be represented by index/value pairs.
And, toArray can be implemented like:

public double[] toArray() {
  double[] arr = new double[size];
  for(Element e : vector) {
    arr[e.getIndex()] = e.getValue();
  }
}

For bit vector, java.util.BitSet can be used.

On Tue, Feb 25, 2014 at 6:14 AM, Yexi Jiang <yx...@apache.org> wrote:
> Hi, All,
>
> I am currently working on the SparseDoubleVector (HAMA-863) and found some
> unclear places about the vector implementation.
>
> 1. What is the definition for a vector? According to the implementation, it
> is implemented as elementwise sqrt. In such a case, problem will occur if
> the one of the entry is negative.
>
> 2. Most of the operators are conducted on a copy of the current object. Do
> we also need to provide a set of operators that directly modify the current
> object itself? e.g. addOriginal, subtractOriginal, etc.
>
> 3. When a DenseDoubleVector operates with a SparseDoubleVector, what will
> be the concrete type of the result object? A simple implementation is to
> always return a SparseDoubleVector, even if it is dense. A complex
> implementation is we maintain a sparsity ratio (the ratio of non-default
> entries), if the ratio exceed a threshold, a DenseDoubleVector will be
> returned.
>
> 4. Is the toArray method available for SparseDoubleVector? In my opinion,
> it is better not to do that.
>
>
> Regards,
> Yexi



-- 
Edward J. Yoon (@eddieyoon)
Chief Executive Officer
DataSayer, Inc.