You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Gilles (JIRA)" <ji...@apache.org> on 2017/02/24 13:41:44 UTC

[jira] [Commented] (MATH-1403) Collinearity test: QR Decomposition rank incorrect (SVD ok)

    [ https://issues.apache.org/jira/browse/MATH-1403?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15882686#comment-15882686 ] 

Gilles commented on MATH-1403:
------------------------------

It looks wrong indeed.
The Javadoc mentions "When a large fall in norm is seen, the rank is returned" which seems fairly unhelpful in order to select an appropriate threshold value.

As human resources have become scarce for the Commons Math project, you are most welcome to look at the code in order to find the bug.
I've slightly modified your example (transformed into a unit test):
{code}
    @Test
    public void testMath1403() {
        final double delta = 1e-7; // Test fails when delta <= 1e-8.
        final double[][] m = {
            {1, 1, 1, 1 + delta, 1, 1},
            {1, 1, 1, delta, 0, 0},
            {0, 0, 0, 1, 1, 1},
            {1, 0, 0, 1, 0, 0},
            {0, 0, 1, 0, 0, 1}
        };

        final RRQRDecomposition qr = new RRQRDecomposition(new Array2DRowRealMatrix(m));
        final double dropThreshold = 1e-7; // Test fails when dropThreshold <= 1e-8.
        Assert.assertEquals(4, qr.getRank(dropThreshold));
    }
{code}
It hints at a numerical problem...


> Collinearity test: QR Decomposition rank incorrect (SVD ok)
> -----------------------------------------------------------
>
>                 Key: MATH-1403
>                 URL: https://issues.apache.org/jira/browse/MATH-1403
>             Project: Commons Math
>          Issue Type: Bug
>    Affects Versions: 3.6.1
>         Environment: Linux ubuntu
> JDK 8
>            Reporter: Hugo Ferrira
>
> Hello,
> I am aware that such a question have been asked before but I cannot seem to solve this issue for a very simple example. The closest example I have is:
> https://issues.apache.org/jira/browse/MATH-1100
> from which I could not get an answer.
> I am trying to copy an algorithm from R's Caret package that identifies collinear columns of a matrix [1]. I am assuming a "long" matrix and and am using the trivial example from the reference above. However I cannot get this to work because the QR's rank result is incorrect.
> I have the following example:
> import org.apache.commons.math3.linear.RealMatrix;
> import org.apache.commons.math3.linear.RRQRDecomposition;
> import org.apache.commons.math3.linear.Array2DRowRealMatrix;
> import org.apache.commons.math3.linear.SingularValueDecomposition ;
> public class QRIssue {
>   public static void main(String[] args) {
>     double[][] am = new double[5][];
>     double[] c1 = new double[] {1.0, 1.0, 1.0, 1.0, 1.0, 1.0} ;
>     double[] c2 = new double[] {1.0, 1.0, 1.0, 0.0, 0.0, 0.0} ;
>     double[] c3 = new double[] {0.0, 0.0, 0.0, 1.0, 1.0, 1.0} ;
>     double[] c4 = new double[] {1.0, 0.0, 0.0, 1.0, 0.0, 0.0 } ;
>     double[] c6 = new double[] {0.0, 0.0, 1.0, 0.0, 0.0, 1.0 } ;
>     am[0] = c1 ;
>     am[1] = c2 ;
>     am[2] = c3 ;
>     am[3] = c4 ;
>     am[4] = c6 ;
>     Double threshold = 1e-1;
>     Array2DRowRealMatrix m = new Array2DRowRealMatrix( am, false )  ; // use array, don't copy
>     RRQRDecomposition qr = new RRQRDecomposition( m,  threshold) ;
>     RealMatrix r = qr.getR() ;
>     int numColumns = r.getColumnDimension() ;
>     int rank = qr.getRank( threshold ) ;
>     System.out.println("QR rank: " + rank) ;
>     System.out.println("QR is singular: " + !qr.getSolver().isNonSingular()) ;
>     System.out.println("QR is singular: " + (numColumns == rank) ) ;
>     SingularValueDecomposition sv2 = new org.apache.commons.math3.linear.SingularValueDecomposition(m);
>     System.out.println("SVD rank: " + sv2.getRank()) ;
>     }
> }
> For SVD I get a rank of 4 which is correct (columns 0,1,2 are collinear : c0 = c1 + c2). But for QR I get 5. I have tried several thresholds with no success. For several subsets of the columns above (example only 0,1,2 I get the correct answer). What am I doing wrong?
> TIA,
> Hugo F.
> 1. https://topepo.github.io/caret/pre-processing.html#lindep



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)