You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Weishung Chung <we...@gmail.com> on 2011/02/11 17:33:19 UTC

cardinality vs size

Is cardinality the original size of the vector including zeros and size is
the number of nonzeros in the vector?
I am referring to

 public RandomAccessSparseVector(int cardinality, int size)

Thank you :)

Re: cardinality vs size

Posted by Weishung Chung <we...@gmail.com>.
sounds good to me :)

On Fri, Feb 11, 2011 at 6:46 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Any objections to that? I'd go for a quick refactoring without a jira if no
> one objects.
>
> --sebastian
>
>
> On 11.02.2011 22:36, Ted Dunning wrote:
>
>> +10!!
>>
>> On Fri, Feb 11, 2011 at 10:13 AM, Sebastian Schelter <ssc@apache.org
>> <ma...@apache.org>> wrote:
>>
>>    Maybe we should rename them to something like dimension and
>>    initialCapacity then?
>>
>>    --sebastian
>>
>>
>>    On 11.02.2011 18:49, Weishung Chung wrote:
>>
>>        Thanks a lot for the explanation. This really helps me out :)
>>
>>        On Fri, Feb 11, 2011 at 11:38 AM, Ted
>>        Dunning<ted.dunning@gmail.com <ma...@gmail.com>>
>>
>>          wrote:
>>
>>            Argh!
>>
>>            This is a really confusing API.
>>
>>            Cardinality is the dimension of the vector.
>>
>>            Size is the number of storage elements that you want to have
>>            in the vector
>>            initially, much in the style of ArrayList where you specify
>>            how many
>>            elements to pre-allocate.
>>
>>            On Fri, Feb 11, 2011 at 8:33 AM, Weishung
>>            Chung<weishung@gmail.com <ma...@gmail.com>>
>>
>>            wrote:
>>
>>                Is cardinality the original size of the vector including
>>                zeros and size
>>
>>            is
>>
>>                the number of nonzeros in the vector?
>>                I am referring to
>>
>>                  public RandomAccessSparseVector(int cardinality, int
>> size)
>>
>>                Thank you :)
>>
>>
>>
>>
>

Re: cardinality vs size

Posted by Weishung Chung <we...@gmail.com>.
totally agreed with Ted !

On Sat, Feb 12, 2011 at 12:57 PM, Ted Dunning <te...@gmail.com> wrote:

> Actually, I think that most of us understand that size refers to the
> dimension of the vector (by analogy with ArrayList).
>
> How about we go with a strong convention that size() returns dimensionality
> and change the constructor args for RASV.  The real problem here is that
> second argument.
>
> Then if we need to, we can come up with an accessor that gives us back the
> allocated capacity of a vector.  For DenseVector, that would be equal to
> size().  For RASV it would start at the initialCapacity and grow as needed
> but always be <= size() + epsilon and >= the number of non-zeros.  For some
> other sparse formats, it might be equal to the current number of non-zeros.
>
> On Sat, Feb 12, 2011 at 8:52 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > I believe most of us understand that Vector.size() and Matrix.size()
> refer
> > to the size of the vector or matrix, so it's not that a big deal.
> > But I would recommend just rename the size in the constructor to
> > initialCapacity which would be clear to most of us that it refers to the
> > initialCapacity of the internal backing map. Just my two cents :D
> >
> > RandomAccessSparseVector(int cardinality, int size)
> >
> >
> > On Sat, Feb 12, 2011 at 5:03 AM, Sebastian Schelter <ss...@apache.org>
> > wrote:
> >
> > > You're right, I forgot about that. We'd have to rename Vector.size() to
> > > Vector.dimension() to be consistent... And maybe Matrix.size() too?
> > >
> > > Makes the refactoring a little bit more complicated. I think we should
> > also
> > > keep Vector.size() and Matrix.size() as deprecated methods for a little
> > time
> > > so we don't break any uncommitted patches.
> > >
> > > What do you think?
> > >
> > > --sebastian
> > >
> > >
> > > On 12.02.2011 03:29, Ted Dunning wrote:
> > >
> > >> It's a great idea.
> > >>
> > >> Changing any accessor names is a bit of a bigger deal, but still
> > >> probably a good idea if we get consensus.
> > >>
> > >> On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter <ssc@apache.org
> > >> <ma...@apache.org>> wrote:
> > >>
> > >>    Any objections to that? I'd go for a quick refactoring without a
> > >>    jira if no one objects.
> > >>
> > >>
> > >>
> > >
> >
>

Re: cardinality vs size

Posted by Ted Dunning <te...@gmail.com>.
Actually, I think that most of us understand that size refers to the
dimension of the vector (by analogy with ArrayList).

How about we go with a strong convention that size() returns dimensionality
and change the constructor args for RASV.  The real problem here is that
second argument.

Then if we need to, we can come up with an accessor that gives us back the
allocated capacity of a vector.  For DenseVector, that would be equal to
size().  For RASV it would start at the initialCapacity and grow as needed
but always be <= size() + epsilon and >= the number of non-zeros.  For some
other sparse formats, it might be equal to the current number of non-zeros.

On Sat, Feb 12, 2011 at 8:52 AM, Weishung Chung <we...@gmail.com> wrote:

> I believe most of us understand that Vector.size() and Matrix.size() refer
> to the size of the vector or matrix, so it's not that a big deal.
> But I would recommend just rename the size in the constructor to
> initialCapacity which would be clear to most of us that it refers to the
> initialCapacity of the internal backing map. Just my two cents :D
>
> RandomAccessSparseVector(int cardinality, int size)
>
>
> On Sat, Feb 12, 2011 at 5:03 AM, Sebastian Schelter <ss...@apache.org>
> wrote:
>
> > You're right, I forgot about that. We'd have to rename Vector.size() to
> > Vector.dimension() to be consistent... And maybe Matrix.size() too?
> >
> > Makes the refactoring a little bit more complicated. I think we should
> also
> > keep Vector.size() and Matrix.size() as deprecated methods for a little
> time
> > so we don't break any uncommitted patches.
> >
> > What do you think?
> >
> > --sebastian
> >
> >
> > On 12.02.2011 03:29, Ted Dunning wrote:
> >
> >> It's a great idea.
> >>
> >> Changing any accessor names is a bit of a bigger deal, but still
> >> probably a good idea if we get consensus.
> >>
> >> On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter <ssc@apache.org
> >> <ma...@apache.org>> wrote:
> >>
> >>    Any objections to that? I'd go for a quick refactoring without a
> >>    jira if no one objects.
> >>
> >>
> >>
> >
>

Re: cardinality vs size

Posted by Ted Dunning <te...@gmail.com>.
Great.

Thanks!

On Mon, Feb 14, 2011 at 3:26 PM, Sebastian Schelter <ss...@apache.org> wrote:

> I did exactly what Weishung proposed, just renamed the size arg to
> initialCapacity, I think we're good with that.
>
> --sebastian
>
> On 12.02.2011 17:52, Weishung Chung wrote:
>
>> I believe most of us understand that Vector.size() and Matrix.size()
>> refer to the size of the vector or matrix, so it's not that a big deal.
>> But I would recommend just rename the size in the constructor to
>> initialCapacity which would be clear to most of us that it refers to the
>> initialCapacity of the internal backing map. Just my two cents :D
>>
>> RandomAccessSparseVector(int cardinality, int size)
>>
>>
>> On Sat, Feb 12, 2011 at 5:03 AM, Sebastian Schelter <ssc@apache.org
>> <ma...@apache.org>> wrote:
>>
>>    You're right, I forgot about that. We'd have to rename Vector.size()
>>    to Vector.dimension() to be consistent... And maybe Matrix.size() too?
>>
>>    Makes the refactoring a little bit more complicated. I think we
>>    should also keep Vector.size() and Matrix.size() as deprecated
>>    methods for a little time so we don't break any uncommitted patches.
>>
>>    What do you think?
>>
>>    --sebastian
>>
>>
>>    On 12.02.2011 03:29, Ted Dunning wrote:
>>
>>        It's a great idea.
>>
>>        Changing any accessor names is a bit of a bigger deal, but still
>>        probably a good idea if we get consensus.
>>
>>        On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter
>>        <ssc@apache.org <ma...@apache.org>
>>        <mailto:ssc@apache.org <ma...@apache.org>>> wrote:
>>
>>            Any objections to that? I'd go for a quick refactoring without
>> a
>>            jira if no one objects.
>>
>>
>>
>>
>>
>

Re: cardinality vs size

Posted by Sebastian Schelter <ss...@apache.org>.
I did exactly what Weishung proposed, just renamed the size arg to 
initialCapacity, I think we're good with that.

--sebastian

On 12.02.2011 17:52, Weishung Chung wrote:
> I believe most of us understand that Vector.size() and Matrix.size()
> refer to the size of the vector or matrix, so it's not that a big deal.
> But I would recommend just rename the size in the constructor to
> initialCapacity which would be clear to most of us that it refers to the
> initialCapacity of the internal backing map. Just my two cents :D
>
> RandomAccessSparseVector(int cardinality, int size)
>
>
> On Sat, Feb 12, 2011 at 5:03 AM, Sebastian Schelter <ssc@apache.org
> <ma...@apache.org>> wrote:
>
>     You're right, I forgot about that. We'd have to rename Vector.size()
>     to Vector.dimension() to be consistent... And maybe Matrix.size() too?
>
>     Makes the refactoring a little bit more complicated. I think we
>     should also keep Vector.size() and Matrix.size() as deprecated
>     methods for a little time so we don't break any uncommitted patches.
>
>     What do you think?
>
>     --sebastian
>
>
>     On 12.02.2011 03:29, Ted Dunning wrote:
>
>         It's a great idea.
>
>         Changing any accessor names is a bit of a bigger deal, but still
>         probably a good idea if we get consensus.
>
>         On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter
>         <ssc@apache.org <ma...@apache.org>
>         <mailto:ssc@apache.org <ma...@apache.org>>> wrote:
>
>             Any objections to that? I'd go for a quick refactoring without a
>             jira if no one objects.
>
>
>
>


Re: cardinality vs size

Posted by Weishung Chung <we...@gmail.com>.
I believe most of us understand that Vector.size() and Matrix.size() refer
to the size of the vector or matrix, so it's not that a big deal.
But I would recommend just rename the size in the constructor to
initialCapacity which would be clear to most of us that it refers to the
initialCapacity of the internal backing map. Just my two cents :D

RandomAccessSparseVector(int cardinality, int size)


On Sat, Feb 12, 2011 at 5:03 AM, Sebastian Schelter <ss...@apache.org> wrote:

> You're right, I forgot about that. We'd have to rename Vector.size() to
> Vector.dimension() to be consistent... And maybe Matrix.size() too?
>
> Makes the refactoring a little bit more complicated. I think we should also
> keep Vector.size() and Matrix.size() as deprecated methods for a little time
> so we don't break any uncommitted patches.
>
> What do you think?
>
> --sebastian
>
>
> On 12.02.2011 03:29, Ted Dunning wrote:
>
>> It's a great idea.
>>
>> Changing any accessor names is a bit of a bigger deal, but still
>> probably a good idea if we get consensus.
>>
>> On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter <ssc@apache.org
>> <ma...@apache.org>> wrote:
>>
>>    Any objections to that? I'd go for a quick refactoring without a
>>    jira if no one objects.
>>
>>
>>
>

Re: cardinality vs size

Posted by Sebastian Schelter <ss...@apache.org>.
You're right, I forgot about that. We'd have to rename Vector.size() to 
Vector.dimension() to be consistent... And maybe Matrix.size() too?

Makes the refactoring a little bit more complicated. I think we should 
also keep Vector.size() and Matrix.size() as deprecated methods for a 
little time so we don't break any uncommitted patches.

What do you think?

--sebastian

On 12.02.2011 03:29, Ted Dunning wrote:
> It's a great idea.
>
> Changing any accessor names is a bit of a bigger deal, but still
> probably a good idea if we get consensus.
>
> On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter <ssc@apache.org
> <ma...@apache.org>> wrote:
>
>     Any objections to that? I'd go for a quick refactoring without a
>     jira if no one objects.
>
>


Re: cardinality vs size

Posted by Robin Anil <ro...@gmail.com>.
Yeah, Initial size sounds perfect

On Sat, Feb 12, 2011 at 7:59 AM, Ted Dunning <te...@gmail.com> wrote:

> It's a great idea.
>
> Changing any accessor names is a bit of a bigger deal, but still probably a
> good idea if we get consensus.
>
> On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter <ss...@apache.org>
> wrote:
>
> > Any objections to that? I'd go for a quick refactoring without a jira if
> no
> > one objects.
>

Re: cardinality vs size

Posted by Ted Dunning <te...@gmail.com>.
It's a great idea.

Changing any accessor names is a bit of a bigger deal, but still probably a
good idea if we get consensus.

On Fri, Feb 11, 2011 at 4:46 PM, Sebastian Schelter <ss...@apache.org> wrote:

> Any objections to that? I'd go for a quick refactoring without a jira if no
> one objects.

Re: cardinality vs size

Posted by Sebastian Schelter <ss...@apache.org>.
Any objections to that? I'd go for a quick refactoring without a jira if 
no one objects.

--sebastian

On 11.02.2011 22:36, Ted Dunning wrote:
> +10!!
>
> On Fri, Feb 11, 2011 at 10:13 AM, Sebastian Schelter <ssc@apache.org
> <ma...@apache.org>> wrote:
>
>     Maybe we should rename them to something like dimension and
>     initialCapacity then?
>
>     --sebastian
>
>
>     On 11.02.2011 18:49, Weishung Chung wrote:
>
>         Thanks a lot for the explanation. This really helps me out :)
>
>         On Fri, Feb 11, 2011 at 11:38 AM, Ted
>         Dunning<ted.dunning@gmail.com <ma...@gmail.com>>
>           wrote:
>
>             Argh!
>
>             This is a really confusing API.
>
>             Cardinality is the dimension of the vector.
>
>             Size is the number of storage elements that you want to have
>             in the vector
>             initially, much in the style of ArrayList where you specify
>             how many
>             elements to pre-allocate.
>
>             On Fri, Feb 11, 2011 at 8:33 AM, Weishung
>             Chung<weishung@gmail.com <ma...@gmail.com>>
>             wrote:
>
>                 Is cardinality the original size of the vector including
>                 zeros and size
>
>             is
>
>                 the number of nonzeros in the vector?
>                 I am referring to
>
>                   public RandomAccessSparseVector(int cardinality, int size)
>
>                 Thank you :)
>
>
>


Re: cardinality vs size

Posted by Ted Dunning <te...@gmail.com>.
+10!!

On Fri, Feb 11, 2011 at 10:13 AM, Sebastian Schelter <ss...@apache.org> wrote:

> Maybe we should rename them to something like dimension and initialCapacity
> then?
>
> --sebastian
>
>
> On 11.02.2011 18:49, Weishung Chung wrote:
>
>> Thanks a lot for the explanation. This really helps me out :)
>>
>> On Fri, Feb 11, 2011 at 11:38 AM, Ted Dunning<te...@gmail.com>
>>  wrote:
>>
>>  Argh!
>>>
>>> This is a really confusing API.
>>>
>>> Cardinality is the dimension of the vector.
>>>
>>> Size is the number of storage elements that you want to have in the
>>> vector
>>> initially, much in the style of ArrayList where you specify how many
>>> elements to pre-allocate.
>>>
>>> On Fri, Feb 11, 2011 at 8:33 AM, Weishung Chung<we...@gmail.com>
>>> wrote:
>>>
>>>  Is cardinality the original size of the vector including zeros and size
>>>>
>>> is
>>>
>>>> the number of nonzeros in the vector?
>>>> I am referring to
>>>>
>>>>  public RandomAccessSparseVector(int cardinality, int size)
>>>>
>>>> Thank you :)
>>>>
>>>>
>

Re: cardinality vs size

Posted by Sebastian Schelter <ss...@apache.org>.
Maybe we should rename them to something like dimension and 
initialCapacity then?

--sebastian

On 11.02.2011 18:49, Weishung Chung wrote:
> Thanks a lot for the explanation. This really helps me out :)
>
> On Fri, Feb 11, 2011 at 11:38 AM, Ted Dunning<te...@gmail.com>  wrote:
>
>> Argh!
>>
>> This is a really confusing API.
>>
>> Cardinality is the dimension of the vector.
>>
>> Size is the number of storage elements that you want to have in the vector
>> initially, much in the style of ArrayList where you specify how many
>> elements to pre-allocate.
>>
>> On Fri, Feb 11, 2011 at 8:33 AM, Weishung Chung<we...@gmail.com>
>> wrote:
>>
>>> Is cardinality the original size of the vector including zeros and size
>> is
>>> the number of nonzeros in the vector?
>>> I am referring to
>>>
>>>   public RandomAccessSparseVector(int cardinality, int size)
>>>
>>> Thank you :)
>>>


Re: cardinality vs size

Posted by Weishung Chung <we...@gmail.com>.
Thanks a lot for the explanation. This really helps me out :)

On Fri, Feb 11, 2011 at 11:38 AM, Ted Dunning <te...@gmail.com> wrote:

> Argh!
>
> This is a really confusing API.
>
> Cardinality is the dimension of the vector.
>
> Size is the number of storage elements that you want to have in the vector
> initially, much in the style of ArrayList where you specify how many
> elements to pre-allocate.
>
> On Fri, Feb 11, 2011 at 8:33 AM, Weishung Chung <we...@gmail.com>
> wrote:
>
> > Is cardinality the original size of the vector including zeros and size
> is
> > the number of nonzeros in the vector?
> > I am referring to
> >
> >  public RandomAccessSparseVector(int cardinality, int size)
> >
> > Thank you :)
> >
>

Re: cardinality vs size

Posted by Ted Dunning <te...@gmail.com>.
Argh!

This is a really confusing API.

Cardinality is the dimension of the vector.

Size is the number of storage elements that you want to have in the vector
initially, much in the style of ArrayList where you specify how many
elements to pre-allocate.

On Fri, Feb 11, 2011 at 8:33 AM, Weishung Chung <we...@gmail.com> wrote:

> Is cardinality the original size of the vector including zeros and size is
> the number of nonzeros in the vector?
> I am referring to
>
>  public RandomAccessSparseVector(int cardinality, int size)
>
> Thank you :)
>