You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@commons.apache.org by Phil Steitz <ph...@gmail.com> on 2010/03/13 21:39:49 UTC

Re: [math] 2.1 Release plan

Phil Steitz wrote:
> Phil Steitz wrote:
>> luc.maisonobe@free.fr wrote:
>>> ----- "Phil Steitz (JIRA)" <ji...@apache.org> a écrit :
>>>
>>>> [
>>>> https://issues.apache.org/jira/browse/MATH-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>> ]
>>>>
>>>> Phil Steitz resolved MATH-282.
>>>> ------------------------------
>>>>
>>>>     Resolution: Fixed
>>> Great !
>>>
>>> With this fix, there are no pending issues for 2.1. Gilles is working on two unscheduled issues.
>>> I suggest once he has committed the fixes, we can go ahead to release 2.1.
>>>
>>> Any thoughts about this ?
>> First, let me volunteer to RM.  I will take the opportunity to
>> update the releasing docs as I go, so it will be easier for others
>> to do this in the future.
>>
>> Second, we should decide on the "unscheduled" issues - all should be
>> assigned a fix version or closed as WONTFIX, INVALID, INCOMPLETE,
>> etc.  Any that are incomplete, we can add a comment requesting that
>> they be reopened if and when more info becomes available.
>>
>> Third, I would like to squeeze in 330, 331, 332.  The patch in 332
>> almost resolves all three of these.  All that is needed are test
>> cases and I have started working on these.  If I run into problems,
>> I will mark these issues as 2.2.  I will fish or cut bait on this
>> today.
> 
> I am still working on this. I ran into a few problems with the
> density implementations, but should be able to fix and complete
> tests in the next couple of days.  Would appreciate others weighing
> in on uncategorized issues.  OLS regression (350 - good catch!) also
> obviously needs to be fixed.  Thanks to all for testing.
> 
> Phil
> 
>> Finally, we should review the Clirr report and verify that @since
>> tags have been added for anything new since 2.0.  I know I just
>> missed a couple of them.
>>

I am almost done now with MATH-330-332 (add densities for all
continuous distributions).  I took a stab at categorizing the
uncategorized issues.  I put some of them at 2.1, meaning we need to
resolve them prior to release.  Patches / opinions that it is OK to
push some of them out welcome.

Phil

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] 2.1 Release plan

Posted by Ted Dunning <te...@gmail.com>.
Computation of the largest few singular values (often associated with a
partial set of singular vectors, either left, right or both) is very much
the business of the SVD algorithm because computing just a few singular
values is much more efficient than computing the entire set and often
suffices just as well as the full set.

Consider a common case of a sparse matrix that has 100 million rows and 10
million columns that contains small counts from a random process.  Assume
also that there are, on average, about 100 nonzero elements per row.  There
is no chance that finding all 10 million singular values or vectors is of
any practical interest, partially because the truncation due to the counting
process would obscure all but a the largest few values and practically
because computing all of the singular values and vectors would exhaust any
computer that is likely to exist in our lifetime.  The computation of the
first few dozen singular values is quite feasible, however, even with
current hardware.

On Sun, Mar 14, 2010 at 1:44 PM, Dimitri Pourbaix
<po...@astro.ulb.ac.be>wrote:

> I think there are two parts in the isse.
>> The first one is related to sparse matrix and we don't have an answer
>> yet. The second part is related to compute a partial set of singular
>> values. This is used for example in image compression or to find a
>> matrix with reduce rank that is the closest possible to an input matrix.
>> for this part, we may have an answer.
>>
>
> What do you mean by partial set of singular values?  If you mean setting
> all the singular values which are either below some threshold, of index
> above some value (rank), ... to zero and to compute the resulting product
> as an approximation of the original matrix, this is no longer the business
> of SVD but rather the user business as (s)he decides what (s)he does with
> the decomposition.  However, one could add a method to SVD which would
> return such a 'product'.

Re: [math] 2.1 Release plan

Posted by lu...@free.fr.
----- "Dimitri Pourbaix" <po...@astro.ulb.ac.be> a écrit :

> Luc,
> 
> > I think there are two parts in the isse.
> > The first one is related to sparse matrix and we don't have an
> answer
> > yet. The second part is related to compute a partial set of
> singular
> > values. This is used for example in image compression or to find a
> > matrix with reduce rank that is the closest possible to an input
> matrix.
> > for this part, we may have an answer.
> 
> What do you mean by partial set of singular values?  If you mean
> setting
> all the singular values which are either below some threshold, of
> index
> above some value (rank), ... to zero and to compute the resulting
> product
> as an approximation of the original matrix, this is no longer the
> business
> of SVD but rather the user business as (s)he decides what (s)he does
> with
> the decomposition.  However, one could add a method to SVD which
> would
> return such a 'product'.

This is the business of SVD. See for example <http://public.lanl.gov/mewall/kluwer2002.html>, or <http://www2.imm.dtu.dk/~pch/Projekter/tsvd.html>, or even <http://en.wikipedia.org/wiki/Singular_value_decomposition#Matrix_approximation> and <http://en.wikipedia.org/wiki/Singular_value_decomposition#Reduced_SVDs>.

The point in these methods is not to have an exact representation of A = USVt but to truncate it in order to have a matrix A' with some interesting properties. If the initial matrix has a very large dimension, rather than computing the full decomposition and later withdrawing smaller singular values and associated vectors, we prefer to not compute them.

Luc

> 
> Regards,
>   Dim.
> ----------------------------------------------------------------------------
> Dimitri Pourbaix                         *
> Institut d'Astronomie et d'Astrophysique *      Don't worry, be happy
> CP 226, office 2.N4.211, building NO     *         and CARPE DIEM.
> Universite Libre de Bruxelles            *
> Boulevard du Triomphe                    *      Tel : +32-2-650.35.71
>   B-1050 Bruxelles                        *      Fax :
> +32-2-650.42.26
> http://sb9.astro.ulb.ac.be/~pourbaix     *
> mailto:pourbaix@astro.ulb.ac.be
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] 2.1 Release plan

Posted by Dimitri Pourbaix <po...@astro.ulb.ac.be>.
Luc,

> I think there are two parts in the isse.
> The first one is related to sparse matrix and we don't have an answer
> yet. The second part is related to compute a partial set of singular
> values. This is used for example in image compression or to find a
> matrix with reduce rank that is the closest possible to an input matrix.
> for this part, we may have an answer.

What do you mean by partial set of singular values?  If you mean setting
all the singular values which are either below some threshold, of index
above some value (rank), ... to zero and to compute the resulting product
as an approximation of the original matrix, this is no longer the business
of SVD but rather the user business as (s)he decides what (s)he does with
the decomposition.  However, one could add a method to SVD which would
return such a 'product'.

Regards,
  Dim.
----------------------------------------------------------------------------
Dimitri Pourbaix                         *
Institut d'Astronomie et d'Astrophysique *      Don't worry, be happy
CP 226, office 2.N4.211, building NO     *         and CARPE DIEM.
Universite Libre de Bruxelles            *
Boulevard du Triomphe                    *      Tel : +32-2-650.35.71
  B-1050 Bruxelles                        *      Fax : +32-2-650.42.26
http://sb9.astro.ulb.ac.be/~pourbaix     * mailto:pourbaix@astro.ulb.ac.be

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] 2.1 Release plan

Posted by Ted Dunning <te...@gmail.com>.
For sparse matrices, almost the only requirement is a good dense SVD.  If
you have an SVD that is very fast for tall skinny matrices, that would be
excellent.  If not, that can be worked around.

The basic idea is that right multiplication by a tall skinny random array
(which is never stored explicitly) gives a tall skinny matrix to decompose.
If the tall random array has a small multiple more columns than you want
singular values, then you can say with high probability that the left
singular vectors of the product contains the left singular vectors of the
original matrix.  Peeling these off and doing an SVD to the short wide
result gives you the singular values and right singular vectors.

This is vastly simpler than the more commonly used Lanczos algorithm and
more susceptible to performance improvement due to BLAS improvements,
threading or parallelization because it involves matrix-matrix products
rather than matrix-vector products.

These techniques are described here
http://arxiv4.library.cornell.edu/abs/0909.4061

On Sun, Mar 14, 2010 at 12:21 PM, Luc Maisonobe <Lu...@free.fr>wrote:

> The first one is related to sparse matrix and we don't have an answer
> yet.
>

Re: [math] 2.1 Release plan

Posted by Luc Maisonobe <Lu...@free.fr>.
Dimitri Pourbaix a écrit :
> Luc,
> 
>> Concerning MATH-321, the partial fix I proposed has been withdrawn by
>> the complete rewrite of SVD.
>> This partial fix also was not considered mathematically sound by Dimitri
>> since the dimensions of the matrices did not match complete SVD. Well,
>> this was in fact desired as MATH-321 explicitly asks for non-full SVD.
>>
>> So I'm not sure anymore what we need to do. If we follow Dimitri's
>> rationale, we should close this as WON'T FIX. If we decide to fix it, I
>> think we should postpone it to after 2.1 to have time to decide how we
>> solve it. SVD has already been vastly improved by Dimitri's work, so I
>> would really much like to see it published now.
>>
>> Dim, what do you think about this ?
> 
> The version I uploaded does not benefit from any sparsity of the matrix
> to be decomposed.  I have not looked at any algorithm for this kind of
> matrix as it would likely involve a different way of storing the matrix
> itself.  I agree to put "WON'T FIX" for 321 unless someone can come up
> with an example showing that that, for a sparse matrix, the present
> version does not return the right answer.

I think there are two parts in the isse.
The first one is related to sparse matrix and we don't have an answer
yet. The second part is related to compute a partial set of singular
values. This is used for example in image compression or to find a
matrix with reduce rank that is the closest possible to an input matrix.
for this part, we may have an answer.

So for the first part, I think WON'T FIX may be an answer (or
INCOMPLETE, asking for a patch ...). For the second part, I would
suggest to simply postpone to 3.0.

Luc

> 
> Dim.
> ----------------------------------------------------------------------------
> 
> Dimitri Pourbaix                         *
> Institut d'Astronomie et d'Astrophysique *      Don't worry, be happy
> CP 226, office 2.N4.211, building NO     *         and CARPE DIEM.
> Universite Libre de Bruxelles            *
> Boulevard du Triomphe                    *      Tel : +32-2-650.35.71
>  B-1050 Bruxelles                        *      Fax : +32-2-650.42.26
> http://sb9.astro.ulb.ac.be/~pourbaix     * mailto:pourbaix@astro.ulb.ac.be
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] 2.1 Release plan

Posted by Dimitri Pourbaix <po...@astro.ulb.ac.be>.
Luc,

> Concerning MATH-321, the partial fix I proposed has been withdrawn by
> the complete rewrite of SVD.
> This partial fix also was not considered mathematically sound by Dimitri
> since the dimensions of the matrices did not match complete SVD. Well,
> this was in fact desired as MATH-321 explicitly asks for non-full SVD.
>
> So I'm not sure anymore what we need to do. If we follow Dimitri's
> rationale, we should close this as WON'T FIX. If we decide to fix it, I
> think we should postpone it to after 2.1 to have time to decide how we
> solve it. SVD has already been vastly improved by Dimitri's work, so I
> would really much like to see it published now.
>
> Dim, what do you think about this ?

The version I uploaded does not benefit from any sparsity of the matrix
to be decomposed.  I have not looked at any algorithm for this kind of
matrix as it would likely involve a different way of storing the matrix
itself.  I agree to put "WON'T FIX" for 321 unless someone can come up
with an example showing that that, for a sparse matrix, the present
version does not return the right answer.

Dim.
----------------------------------------------------------------------------
Dimitri Pourbaix                         *
Institut d'Astronomie et d'Astrophysique *      Don't worry, be happy
CP 226, office 2.N4.211, building NO     *         and CARPE DIEM.
Universite Libre de Bruxelles            *
Boulevard du Triomphe                    *      Tel : +32-2-650.35.71
  B-1050 Bruxelles                        *      Fax : +32-2-650.42.26
http://sb9.astro.ulb.ac.be/~pourbaix     * mailto:pourbaix@astro.ulb.ac.be

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org


Re: [math] 2.1 Release plan

Posted by Luc Maisonobe <Lu...@free.fr>.
Phil Steitz a écrit :
> Phil Steitz wrote:
>> Phil Steitz wrote:
>>> luc.maisonobe@free.fr wrote:
>>>> ----- "Phil Steitz (JIRA)" <ji...@apache.org> a écrit :
>>>>
>>>>> [
>>>>> https://issues.apache.org/jira/browse/MATH-282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
>>>>> ]
>>>>>
>>>>> Phil Steitz resolved MATH-282.
>>>>> ------------------------------
>>>>>
>>>>>     Resolution: Fixed
>>>> Great !
>>>>
>>>> With this fix, there are no pending issues for 2.1. Gilles is working on two unscheduled issues.
>>>> I suggest once he has committed the fixes, we can go ahead to release 2.1.
>>>>
>>>> Any thoughts about this ?
>>> First, let me volunteer to RM.  I will take the opportunity to
>>> update the releasing docs as I go, so it will be easier for others
>>> to do this in the future.
>>>
>>> Second, we should decide on the "unscheduled" issues - all should be
>>> assigned a fix version or closed as WONTFIX, INVALID, INCOMPLETE,
>>> etc.  Any that are incomplete, we can add a comment requesting that
>>> they be reopened if and when more info becomes available.
>>>
>>> Third, I would like to squeeze in 330, 331, 332.  The patch in 332
>>> almost resolves all three of these.  All that is needed are test
>>> cases and I have started working on these.  If I run into problems,
>>> I will mark these issues as 2.2.  I will fish or cut bait on this
>>> today.
>> I am still working on this. I ran into a few problems with the
>> density implementations, but should be able to fix and complete
>> tests in the next couple of days.  Would appreciate others weighing
>> in on uncategorized issues.  OLS regression (350 - good catch!) also
>> obviously needs to be fixed.  Thanks to all for testing.
>>
>> Phil
>>
>>> Finally, we should review the Clirr report and verify that @since
>>> tags have been added for anything new since 2.0.  I know I just
>>> missed a couple of them.
>>>
> 
> I am almost done now with MATH-330-332 (add densities for all
> continuous distributions).  I took a stab at categorizing the
> uncategorized issues.  I put some of them at 2.1, meaning we need to
> resolve them prior to release.  Patches / opinions that it is OK to
> push some of them out welcome.

Concerning MATH-321, the partial fix I proposed has been withdrawn by
the complete rewrite of SVD.
This partial fix also was not considered mathematically sound by Dimitri
since the dimensions of the matrices did not match complete SVD. Well,
this was in fact desired as MATH-321 explicitly asks for non-full SVD.

So I'm not sure anymore what we need to do. If we follow Dimitri's
rationale, we should close this as WON'T FIX. If we decide to fix it, I
think we should postpone it to after 2.1 to have time to decide how we
solve it. SVD has already been vastly improved by Dimitri's work, so I
would really much like to see it published now.

Dim, what do you think about this ?

Luc


> 
> Phil
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
> For additional commands, e-mail: dev-help@commons.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@commons.apache.org
For additional commands, e-mail: dev-help@commons.apache.org