You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@commons.apache.org by "Benoit de Rancourt (JIRA)" <ji...@apache.org> on 2012/06/12 17:15:43 UTC

[jira] [Created] (MATH-805) Percentile calculation is very slow when input data are constants

Benoit de Rancourt created MATH-805:
---------------------------------------

             Summary: Percentile calculation is very slow when input data are constants
                 Key: MATH-805
                 URL: https://issues.apache.org/jira/browse/MATH-805
             Project: Commons Math
          Issue Type: Improvement
    Affects Versions: 3.0
            Reporter: Benoit de Rancourt
            Priority: Minor


I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Comment Edited] (MATH-805) Percentile calculation is very slow when input data are constants

Posted by "Benoit de Rancourt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416943#comment-13416943 ] 

Benoit de Rancourt edited comment on MATH-805 at 7/18/12 8:42 AM:
------------------------------------------------------------------

Hello,

Here is a simple test case :

{code:title=testPercentile.java|borderStyle=solid}
/**
 * Test the Percentile calculation
 */
public static void testPercentile() {
	
	final double CONST_NUMBER = 18.;
	final double PERCENT = 5.;
	final int DISTRIBUTION_SIZE = (int) 1e5;

	double[] distribution = new double[DISTRIBUTION_SIZE];
	Percentile percentile = new Percentile(PERCENT);
	Random random = new Random(System.nanoTime());
	
	// filling the array with random number
	for (int i = 0; i < distribution.length; i++) {
		distribution[i] = random.nextDouble() * 100.;
	}
	
	System.out.println("Start the calculation with random array");
	long begin = System.currentTimeMillis();
	double result = percentile.evaluate(distribution);
	long end = System.currentTimeMillis();
	System.out.println("duration : " + (end - begin) + "ms");
	System.out.println("result : " + result);
	
	// filling the array with a constant number
	for (int i = 0; i < distribution.length; i++) {
		distribution[i] = CONST_NUMBER;
	}
	
	System.out.println("Start the calculation with constant array");
	begin = System.currentTimeMillis();
	result = percentile.evaluate(distribution);
	end = System.currentTimeMillis();
	System.out.println("duration : " + (end - begin) + "ms");
	System.out.println("result : " + result);
}
{code}

Thanks,
Benoit.
                
      was (Author: teraben):
    Hello,

Here is a simple test case :
	
/**
 * Test the Percentile calculation
 */
public static void testQuantile() {
	
	final double CONST_NUMBER = 18.;
	final double PERCENT = 5.;
	final int DISTRIBUTION_SIZE = (int) 1e5;

	double[] distribution = new double[DISTRIBUTION_SIZE];
	Percentile percentile = new Percentile(PERCENT);
	Random random = new Random(System.nanoTime());
	
	// filling the array with random number
	for (int i = 0; i < distribution.length; i++) {
		distribution[i] = random.nextDouble() * 100.;
	}
	
	System.out.println("Start the calculation with random array");
	long begin = System.currentTimeMillis();
	double result = percentile.evaluate(distribution);
	long end = System.currentTimeMillis();
	System.out.println("duration : " + (end - begin) + "ms");
	System.out.println("result : " + result);
	
	// filling the array with a constant number
	for (int i = 0; i < distribution.length; i++) {
		distribution[i] = CONST_NUMBER;
	}
	
	System.out.println("Start the calculation with constant array");
	begin = System.currentTimeMillis();
	result = percentile.evaluate(distribution);
	end = System.currentTimeMillis();
	System.out.println("duration : " + (end - begin) + "ms");
	System.out.println("result : " + result);
}

Thanks,
Benoit.
                  
> Percentile calculation is very slow when input data are constants
> -----------------------------------------------------------------
>
>                 Key: MATH-805
>                 URL: https://issues.apache.org/jira/browse/MATH-805
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Benoit de Rancourt
>            Priority: Minor
>              Labels: performance, test
>
> I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MATH-805) Percentile calculation is very slow when input data are constants

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved MATH-805.
----------------------------------

    Resolution: Duplicate

Duplicate of MATH-578
                
> Percentile calculation is very slow when input data are constants
> -----------------------------------------------------------------
>
>                 Key: MATH-805
>                 URL: https://issues.apache.org/jira/browse/MATH-805
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Benoit de Rancourt
>            Assignee: Thomas Neidhart
>            Priority: Minor
>              Labels: performance, test
>             Fix For: 3.1
>
>
> I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-805) Percentile calculation is very slow when input data are constants

Posted by "Benoit de Rancourt (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416943#comment-13416943 ] 

Benoit de Rancourt commented on MATH-805:
-----------------------------------------

Hello,

Here is a simple test case :
	
/**
 * Test the Percentile calculation
 */
public static void testQuantile() {
	
	final double CONST_NUMBER = 18.;
	final double PERCENT = 5.;
	final int DISTRIBUTION_SIZE = (int) 1e5;

	double[] distribution = new double[DISTRIBUTION_SIZE];
	Percentile percentile = new Percentile(PERCENT);
	Random random = new Random(System.nanoTime());
	
	// filling the array with random number
	for (int i = 0; i < distribution.length; i++) {
		distribution[i] = random.nextDouble() * 100.;
	}
	
	System.out.println("Start the calculation with random array");
	long begin = System.currentTimeMillis();
	double result = percentile.evaluate(distribution);
	long end = System.currentTimeMillis();
	System.out.println("duration : " + (end - begin) + "ms");
	System.out.println("result : " + result);
	
	// filling the array with a constant number
	for (int i = 0; i < distribution.length; i++) {
		distribution[i] = CONST_NUMBER;
	}
	
	System.out.println("Start the calculation with constant array");
	begin = System.currentTimeMillis();
	result = percentile.evaluate(distribution);
	end = System.currentTimeMillis();
	System.out.println("duration : " + (end - begin) + "ms");
	System.out.println("result : " + result);
}

Thanks,
Benoit.
                
> Percentile calculation is very slow when input data are constants
> -----------------------------------------------------------------
>
>                 Key: MATH-805
>                 URL: https://issues.apache.org/jira/browse/MATH-805
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Benoit de Rancourt
>            Priority: Minor
>              Labels: performance, test
>
> I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (MATH-805) Percentile calculation is very slow when input data are constants

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart resolved MATH-805.
----------------------------------

       Resolution: Fixed
    Fix Version/s: 3.1

Fixed in r1364318.
                
> Percentile calculation is very slow when input data are constants
> -----------------------------------------------------------------
>
>                 Key: MATH-805
>                 URL: https://issues.apache.org/jira/browse/MATH-805
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Benoit de Rancourt
>            Assignee: Thomas Neidhart
>            Priority: Minor
>              Labels: performance, test
>             Fix For: 3.1
>
>
> I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-805) Percentile calculation is very slow when input data are constants

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13414511#comment-13414511 ] 

Thomas Neidhart commented on MATH-805:
--------------------------------------

Hi Benoit,

could you please provide a (simple) test case to demonstrate the behaviour you describe?

Thanks,

Thomas
                
> Percentile calculation is very slow when input data are constants
> -----------------------------------------------------------------
>
>                 Key: MATH-805
>                 URL: https://issues.apache.org/jira/browse/MATH-805
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Benoit de Rancourt
>            Priority: Minor
>              Labels: performance, test
>
> I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (MATH-805) Percentile calculation is very slow when input data are constants

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/MATH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13417717#comment-13417717 ] 

Thomas Neidhart commented on MATH-805:
--------------------------------------

The evaluation of the percentile uses a partition method to sort the data around a pivot element (element that are smaller are before the pivot element, and elements that are larger are behind it).

Now in case of an array with lots of equal values, this can lead to some kind of staircase situation, where the pivot element is just slowly increased from the bottom index and the full array of values has to be evaluated in each iteration. The problem is in the partition method of Percentile.java:

{noformat}
    while (i < j) {
       while ((i < j) && (work[j] >= value)) {
           --j;
       }
       while ((i < j) && (work[i] <= value)) {
           ++i;
       }
    ...
{noformat}

The comparisons here use >= and <= while it should be > and <.
                
> Percentile calculation is very slow when input data are constants
> -----------------------------------------------------------------
>
>                 Key: MATH-805
>                 URL: https://issues.apache.org/jira/browse/MATH-805
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Benoit de Rancourt
>            Assignee: Thomas Neidhart
>            Priority: Minor
>              Labels: performance, test
>
> I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Reopened] (MATH-805) Percentile calculation is very slow when input data are constants

Posted by "Thomas Neidhart (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/MATH-805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Thomas Neidhart reopened MATH-805:
----------------------------------

    
> Percentile calculation is very slow when input data are constants
> -----------------------------------------------------------------
>
>                 Key: MATH-805
>                 URL: https://issues.apache.org/jira/browse/MATH-805
>             Project: Commons Math
>          Issue Type: Improvement
>    Affects Versions: 3.0
>            Reporter: Benoit de Rancourt
>            Assignee: Thomas Neidhart
>            Priority: Minor
>              Labels: performance, test
>             Fix For: 3.1
>
>
> I use the Percentile class to calculate quantile on a big array (10^6 entries). When I have to test the performance of my code, I notice that the calculation of quantile is at least 100x slower when my data are constants (10^6 of the same nomber). Maybe the Percentile calculation can be improved for this special case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira