You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@commons.apache.org by video axescon <vi...@axescon.com> on 2010/09/27 05:36:48 UTC

[math] autocorr

Hello

I cant find autocorrelation function in stats package. Is there a reason why
it wasn't implemented?
It exists in Colt project.

cheers

Re: [math] autocorr

Posted by Phil Steitz <ph...@gmail.com>.

Autocorrelation and Logit regression would fit nicely into Commons Math.
You are welcome to join us as a contributor if you would like to work on
these things here.  It would be best to take the discussion of how and what
to implement to the commons developers list (
http://commons.apache.org/mail-lists.html).

Phil


On 9/27/10, video axescon <vi...@axescon.com> wrote:
>
> Cholesky decomposition is used often for generating correlated random
> sequences.
>
> if I wanted to update OLS/GLM type of regressions and add Logit regression
> estimation. do you think Mahout could be the right place to contribute? or
> does it sound that it's outside Mahout's domain?
>
> On Mon, Sep 27, 2010 at 1:39 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Mahout's primary math support is inherited from Colt, but we are actively
> > deleting capabilities from Colt that we don't think
> > will contribute to scalable data mining goal because if we are going to
> use
> > any capability from Colt, we need to spend significant
> > effort to build tests for the code and we don't want to carry around a
> > bunch
> > of code that isn't useful.
> >
> >
>

RE: [math]How to do standardizing (normalizing)

Posted by Martin Gainty <mg...@hotmail.com>.

Good Morning Erik/Phil

I just encountered a bug in maven-surefire-api factory which went something like this
All array references such as double[], float[],int[] gacked the factory loader 

reason:
this is not  a Class that the factory is looking for 

so what i did was for each situation where double[] double_array;

 

ArrayList<Double> double_array_list=new ArrayList();

for (int d=0;d<double.array.length(); d++)

{

  Double double=new Double(double_array[i]);

  double_array_list.add(double);

}

now my surefire factories (those that setup surefire testcases) will properly discover the unique class object called double_array_list

If anyone else has discovered this anomaly and has a solution please let me know

 

thanks,
Martin Gainty 
______________________________________________ 
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.

Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.



 

> Date: Sat, 2 Oct 2010 20:50:49 -0400
> From: phil.steitz@gmail.com
> To: user@commons.apache.org
> Subject: Re: [math]How to do standardizing (normalizing)
> 
> On 10/1/10 8:32 AM, VanIngen, Erik (FIPS) wrote:
> > Hi Luc and others,
> >
> > I have written the standardize function by myself (see below, including the tests). Would it be possible to have this added to Apache Math Commons?
> >
> 
> Thanks for contributing!
> 
> We should take discussion of this new feature to the dev list. It 
> would be great if you could open a JIRA ticket and attach a patch 
> including implementation code.
> 
> We can talk about how to integrate this into [math] in JIRA comments 
> and / or on the dev list. For now, I will just say that the 
> simplest way to add this would be to add a static method called 
> something like "normalize" to org.apache.commons.math.stat.StatUtils.
> 
> See http://commons.apache.org/patches.html for info on how to create 
> patches and attach them to JIRA tickets. Do not hesitate to ask 
> either on dev list or in private emails if you need help getting set up.
> 
> Thanks!
> 
> Phil
> 
> >
> >
> >
> >
> >
> > /**
> > * The standardise function does not seem to be in Apache math commons.
> > *
> > *
> > * @author Erik van Ingen
> > *
> > */
> > public class Standardize {
> >
> > /**
> > * Standardise the series, so in the end it is having mean of 0 and a standard deviation of 1.
> > *
> > *
> > * @param series
> > * @return
> > */
> > public static double[] run(double[] series) {
> > DescriptiveStatistics stats = new DescriptiveStatistics();
> >
> > // Add the data from the array
> > for (int i = 0; i< series.length; i++) {
> > stats.addValue(series[i]);
> > }
> >
> > // Compute mean and standard deviation
> > double currentMean = stats.getMean();
> > double currentstandardDeviation = stats.getStandardDeviation();
> >
> > // z = (x- mean)/standardDeviation
> > double[] newSeries = new double[series.length];
> >
> > for (int i = 0; i< series.length; i++) {
> > newSeries[i] = (series[i] - currentMean) / currentstandardDeviation;
> > }
> > return newSeries;
> > }
> >
> > }
> >
> >
> >
> > public class StandardizeTest {
> >
> > /**
> > * Run the test with the values 50 and 100 and assume standardized values with a dinstance of 0.01
> > */
> > @Test
> > public void testRun1() {
> > double series[] = { 50, 100 };
> > double expectedSeries[] = { -0.7, 0.7 };
> > double[] out = Standardize.run(series);
> > for (int i = 0; i< out.length; i++) {
> > assertEquals(out[i], expectedSeries[i], 0.01);
> > }
> >
> > }
> >
> > /**
> > * Run with 77 random values, assuming that the outcome has a mean of 0 and a standard deviation of 1.
> > *
> > *
> > *
> > */
> > @Test
> > public void testRun2() {
> > int length = 77;
> > double series[] = new double[length];
> >
> > for (int i = 0; i< length; i++) {
> > series[i] = Math.random();
> > }
> >
> > double standardizedSeries[] = Standardize.run(series);
> >
> > DescriptiveStatistics stats = new DescriptiveStatistics();
> >
> > // Add the data from the array
> > for (int i = 0; i< length; i++) {
> > stats.addValue(standardizedSeries[i]);
> > }
> >
> > double distance = 1E-10;
> > assertEquals(0.0, stats.getMean(), distance);
> > assertEquals(1.0, stats.getStandardDeviation(), distance);
> >
> > }
> >
> > }
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > -----Original Message-----
> > From: Luc Maisonobe [mailto:Luc.Maisonobe@free.fr]
> > Sent: 29 September 2010 18:54
> > To: Commons Users List
> > Subject: Re: [math]How to do standardizing (normalizing)
> >
> >
> > Le 29/09/2010 12:13, VanIngen, Erik (FIPS) a écrit :
> >> Hi Apache Commons Math users
> >>
> >> I am looking for an easy way of standardizing my values a mean 0 and a
> >> standard deviation of 1. What is the best way to do that?
> >>
> >> I have tried this:
> >> DescriptiveStatistics stats = new DescriptiveStatistics();
> >> // adding values
> >> ....
> >> // Compute Mean and StandardDeviation
> >> double mean = stats.getMean();
> >> double std = stats.getStandardDeviation();
> >>
> >> and then standardize each value according z = (x- mean)/std
> >>
> >> But I would like to have just a function of standardize an array
> >> according the parameters mean and std. Is there something like this in
> >> Apache Math Commons?
> >
> > I don't think we have such a function.
> >
> > Luc
> >
> >>
> >> Erik
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> >> For additional commands, e-mail: user-help@commons.apache.org
> >>
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> > For additional commands, e-mail: user-help@commons.apache.org
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> > For additional commands, e-mail: user-help@commons.apache.org
> >
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>

RE: [math]How to do standardizing (normalizing)

Posted by "VanIngen, Erik (FIPS)" <Er...@fao.org>.

Hi Phil,

I have created an issue
https://issues.apache.org/jira/browse/MATH-426
and added the code as a svn diff patch to StatUtils.

For 'Affected Version' I have noted 'Nightly Builds' which might not be the correct place.

Looking forward for the followup!

Cheers,
Erik van Ingen






-----Original Message-----
From: Phil Steitz [mailto:phil.steitz@gmail.com]
Sent: 03 October 2010 02:51
To: Commons Users List
Subject: Re: [math]How to do standardizing (normalizing)


On 10/1/10 8:32 AM, VanIngen, Erik (FIPS) wrote:
> Hi Luc and others,
>
> I have written the standardize function by myself (see below,
> including the tests). Would it be possible to have this added to
> Apache Math Commons?
>

Thanks for contributing!

We should take discussion of this new feature to the dev list.  It
would be great if you could open a JIRA ticket and attach a patch
including implementation code.

We can talk about how to integrate this into [math] in JIRA comments
and / or on the dev list.  For now, I will just say that the
simplest way to add this would be to add a static method called
something like "normalize" to org.apache.commons.math.stat.StatUtils.

See http://commons.apache.org/patches.html for info on how to create
patches and attach them to JIRA tickets.  Do not hesitate to ask
either on dev list or in private emails if you need help getting set up.

Thanks!

Phil

>
>
>
>
>
> /**
>   * The standardise function does not seem to be in Apache math commons.
>   *
>   *
>   * @author Erik van Ingen
>   *
>   */
> public class Standardize {
>
>          /**
>           * Standardise the series, so in the end it is having mean of 0 and a standard deviation of 1.
>           *
>           *
>           * @param series
>           * @return
>           */
>          public static double[] run(double[] series) {
>                  DescriptiveStatistics stats = new
> DescriptiveStatistics();
>
>                  // Add the data from the array
>                  for (int i = 0; i<  series.length; i++) {
>                          stats.addValue(series[i]);
>                  }
>
>                  // Compute mean and standard deviation
>                  double currentMean = stats.getMean();
>                  double currentstandardDeviation =
> stats.getStandardDeviation();
>
>                  // z = (x- mean)/standardDeviation
>                  double[] newSeries = new double[series.length];
>
>                  for (int i = 0; i<  series.length; i++) {
>                          newSeries[i] = (series[i] - currentMean) / currentstandardDeviation;
>                  }
>                  return newSeries;
>          }
>
> }
>
>
>
> public class StandardizeTest {
>
>          /**
>           * Run the test with the values 50 and 100 and assume standardized values with a dinstance of 0.01
>           */
>          @Test
>          public void testRun1() {
>                  double series[] = { 50, 100 };
>                  double expectedSeries[] = { -0.7, 0.7 };
>                  double[] out = Standardize.run(series);
>                  for (int i = 0; i<  out.length; i++) {
>                          assertEquals(out[i], expectedSeries[i], 0.01);
>                  }
>
>          }
>
>          /**
>           * Run with 77 random values, assuming that the outcome has a mean of 0 and a standard deviation of 1.
>           *
>           *
>           *
>           */
>          @Test
>          public void testRun2() {
>                  int length = 77;
>                  double series[] = new double[length];
>
>                  for (int i = 0; i<  length; i++) {
>                          series[i] = Math.random();
>                  }
>
>                  double standardizedSeries[] =
> Standardize.run(series);
>
>                  DescriptiveStatistics stats = new
> DescriptiveStatistics();
>
>                  // Add the data from the array
>                  for (int i = 0; i<  length; i++) {
>                          stats.addValue(standardizedSeries[i]);
>                  }
>
>                  double distance = 1E-10;
>                  assertEquals(0.0, stats.getMean(), distance);
>                  assertEquals(1.0, stats.getStandardDeviation(),
> distance);
>
>          }
>
> }
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Luc Maisonobe [mailto:Luc.Maisonobe@free.fr]
> Sent: 29 September 2010 18:54
> To: Commons Users List
> Subject: Re: [math]How to do standardizing (normalizing)
>
>
> Le 29/09/2010 12:13, VanIngen, Erik (FIPS) a écrit :
>> Hi Apache Commons Math users
>>
>> I am looking for an easy way of standardizing my values a mean 0 and
>> a standard deviation of 1. What is the best way to do that?
>>
>> I have tried this:
>> DescriptiveStatistics stats = new DescriptiveStatistics(); // adding
>> values ....
>> // Compute Mean and StandardDeviation
>> double mean  = stats.getMean();
>> double std = stats.getStandardDeviation();
>>
>> and then standardize each value according z = (x- mean)/std
>>
>> But I would like to have just a function of standardize an array
>> according the parameters mean and std. Is there something like this
>> in Apache Math Commons?
>
> I don't think we have such a function.
>
> Luc
>
>>
>> Erik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math]How to do standardizing (normalizing)

Posted by Phil Steitz <ph...@gmail.com>.

On 10/1/10 8:32 AM, VanIngen, Erik (FIPS) wrote:
> Hi Luc and others,
>
> I have written the standardize function by myself (see below, including the tests). Would it be possible to have this added to Apache Math Commons?
>

Thanks for contributing!

We should take discussion of this new feature to the dev list.  It 
would be great if you could open a JIRA ticket and attach a patch 
including implementation code.

We can talk about how to integrate this into [math] in JIRA comments 
and / or on the dev list.  For now, I will just say that the 
simplest way to add this would be to add a static method called 
something like "normalize" to org.apache.commons.math.stat.StatUtils.

See http://commons.apache.org/patches.html for info on how to create 
patches and attach them to JIRA tickets.  Do not hesitate to ask 
either on dev list or in private emails if you need help getting set up.

Thanks!

Phil

>
>
>
>
>
> /**
>   * The standardise function does not seem to be in Apache math commons.
>   *
>   *
>   * @author Erik van Ingen
>   *
>   */
> public class Standardize {
>
>          /**
>           * Standardise the series, so in the end it is having mean of 0 and a standard deviation of 1.
>           *
>           *
>           * @param series
>           * @return
>           */
>          public static double[] run(double[] series) {
>                  DescriptiveStatistics stats = new DescriptiveStatistics();
>
>                  // Add the data from the array
>                  for (int i = 0; i<  series.length; i++) {
>                          stats.addValue(series[i]);
>                  }
>
>                  // Compute mean and standard deviation
>                  double currentMean = stats.getMean();
>                  double currentstandardDeviation = stats.getStandardDeviation();
>
>                  // z = (x- mean)/standardDeviation
>                  double[] newSeries = new double[series.length];
>
>                  for (int i = 0; i<  series.length; i++) {
>                          newSeries[i] = (series[i] - currentMean) / currentstandardDeviation;
>                  }
>                  return newSeries;
>          }
>
> }
>
>
>
> public class StandardizeTest {
>
>          /**
>           * Run the test with the values 50 and 100 and assume standardized values with a dinstance of 0.01
>           */
>          @Test
>          public void testRun1() {
>                  double series[] = { 50, 100 };
>                  double expectedSeries[] = { -0.7, 0.7 };
>                  double[] out = Standardize.run(series);
>                  for (int i = 0; i<  out.length; i++) {
>                          assertEquals(out[i], expectedSeries[i], 0.01);
>                  }
>
>          }
>
>          /**
>           * Run with 77 random values, assuming that the outcome has a mean of 0 and a standard deviation of 1.
>           *
>           *
>           *
>           */
>          @Test
>          public void testRun2() {
>                  int length = 77;
>                  double series[] = new double[length];
>
>                  for (int i = 0; i<  length; i++) {
>                          series[i] = Math.random();
>                  }
>
>                  double standardizedSeries[] = Standardize.run(series);
>
>                  DescriptiveStatistics stats = new DescriptiveStatistics();
>
>                  // Add the data from the array
>                  for (int i = 0; i<  length; i++) {
>                          stats.addValue(standardizedSeries[i]);
>                  }
>
>                  double distance = 1E-10;
>                  assertEquals(0.0, stats.getMean(), distance);
>                  assertEquals(1.0, stats.getStandardDeviation(), distance);
>
>          }
>
> }
>
>
>
>
>
>
>
>
>
> -----Original Message-----
> From: Luc Maisonobe [mailto:Luc.Maisonobe@free.fr]
> Sent: 29 September 2010 18:54
> To: Commons Users List
> Subject: Re: [math]How to do standardizing (normalizing)
>
>
> Le 29/09/2010 12:13, VanIngen, Erik (FIPS) a écrit :
>> Hi Apache Commons Math users
>>
>> I am looking for an easy way of standardizing my values a mean 0 and a
>> standard deviation of 1. What is the best way to do that?
>>
>> I have tried this:
>> DescriptiveStatistics stats = new DescriptiveStatistics();
>> // adding values
>> ....
>> // Compute Mean and StandardDeviation
>> double mean  = stats.getMean();
>> double std = stats.getStandardDeviation();
>>
>> and then standardize each value according z = (x- mean)/std
>>
>> But I would like to have just a function of standardize an array
>> according the parameters mean and std. Is there something like this in
>> Apache Math Commons?
>
> I don't think we have such a function.
>
> Luc
>
>>
>> Erik
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
>> For additional commands, e-mail: user-help@commons.apache.org
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

RE: [math]How to do standardizing (normalizing)

Posted by "VanIngen, Erik (FIPS)" <Er...@fao.org>.

Hi Luc and others,

I have written the standardize function by myself (see below, including the tests). Would it be possible to have this added to Apache Math Commons?






/**
 * The standardise function does not seem to be in Apache math commons.
 *
 *
 * @author Erik van Ingen
 *
 */
public class Standardize {

        /**
         * Standardise the series, so in the end it is having mean of 0 and a standard deviation of 1.
         *
         *
         * @param series
         * @return
         */
        public static double[] run(double[] series) {
                DescriptiveStatistics stats = new DescriptiveStatistics();

                // Add the data from the array
                for (int i = 0; i < series.length; i++) {
                        stats.addValue(series[i]);
                }

                // Compute mean and standard deviation
                double currentMean = stats.getMean();
                double currentstandardDeviation = stats.getStandardDeviation();

                // z = (x- mean)/standardDeviation
                double[] newSeries = new double[series.length];

                for (int i = 0; i < series.length; i++) {
                        newSeries[i] = (series[i] - currentMean) / currentstandardDeviation;
                }
                return newSeries;
        }

}



public class StandardizeTest {

        /**
         * Run the test with the values 50 and 100 and assume standardized values with a dinstance of 0.01
         */
        @Test
        public void testRun1() {
                double series[] = { 50, 100 };
                double expectedSeries[] = { -0.7, 0.7 };
                double[] out = Standardize.run(series);
                for (int i = 0; i < out.length; i++) {
                        assertEquals(out[i], expectedSeries[i], 0.01);
                }

        }

        /**
         * Run with 77 random values, assuming that the outcome has a mean of 0 and a standard deviation of 1.
         *
         *
         *
         */
        @Test
        public void testRun2() {
                int length = 77;
                double series[] = new double[length];

                for (int i = 0; i < length; i++) {
                        series[i] = Math.random();
                }

                double standardizedSeries[] = Standardize.run(series);

                DescriptiveStatistics stats = new DescriptiveStatistics();

                // Add the data from the array
                for (int i = 0; i < length; i++) {
                        stats.addValue(standardizedSeries[i]);
                }

                double distance = 1E-10;
                assertEquals(0.0, stats.getMean(), distance);
                assertEquals(1.0, stats.getStandardDeviation(), distance);

        }

}









-----Original Message-----
From: Luc Maisonobe [mailto:Luc.Maisonobe@free.fr]
Sent: 29 September 2010 18:54
To: Commons Users List
Subject: Re: [math]How to do standardizing (normalizing)


Le 29/09/2010 12:13, VanIngen, Erik (FIPS) a écrit :
> Hi Apache Commons Math users
>
> I am looking for an easy way of standardizing my values a mean 0 and a
> standard deviation of 1. What is the best way to do that?
>
> I have tried this:
> DescriptiveStatistics stats = new DescriptiveStatistics();
> // adding values
> ....
> // Compute Mean and StandardDeviation
> double mean  = stats.getMean();
> double std = stats.getStandardDeviation();
>
> and then standardize each value according z = (x- mean)/std
>
> But I would like to have just a function of standardize an array
> according the parameters mean and std. Is there something like this in
> Apache Math Commons?

I don't think we have such a function.

Luc

>
> Erik
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math]How to do standardizing (normalizing)

Posted by Luc Maisonobe <Lu...@free.fr>.

Le 29/09/2010 12:13, VanIngen, Erik (FIPS) a écrit :
> Hi Apache Commons Math users
> 
> I am looking for an easy way of standardizing my values a mean 0 and a standard deviation of 1. What is the best way to do that?
> 
> I have tried this:
> DescriptiveStatistics stats = new DescriptiveStatistics();
> // adding values
> ....
> // Compute Mean and StandardDeviation
> double mean  = stats.getMean();
> double std = stats.getStandardDeviation();
> 
> and then standardize each value according z = (x- mean)/std
> 
> But I would like to have just a function of standardize an array according the parameters mean and std. Is there something like this in Apache Math Commons?

I don't think we have such a function.

Luc

> 
> Erik
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
> For additional commands, e-mail: user-help@commons.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

[math]How to do standardizing (normalizing)

Posted by "VanIngen, Erik (FIPS)" <Er...@fao.org>.

Hi Apache Commons Math users

I am looking for an easy way of standardizing my values a mean 0 and a standard deviation of 1. What is the best way to do that?

I have tried this:
DescriptiveStatistics stats = new DescriptiveStatistics();
// adding values
....
// Compute Mean and StandardDeviation
double mean  = stats.getMean();
double std = stats.getStandardDeviation();

and then standardize each value according z = (x- mean)/std

But I would like to have just a function of standardize an array according the parameters mean and std. Is there something like this in Apache Math Commons?

Erik

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] autocorr

Posted by Ted Dunning <te...@gmail.com>.

Mahout has logistic regression (= logit regression, I think) based on
stochastic gradient descent and
optimized for sparse features.  It might be useful for you as a starting
point.

If you produce reasonably scalable regression code, then Mahout is a great
place for it.

On Mon, Sep 27, 2010 at 10:59 AM, video axescon <vi...@axescon.com> wrote:

> Cholesky decomposition is used often for generating correlated random
> sequences.
>
> if I wanted to update OLS/GLM type of regressions and add Logit regression
> estimation. do you think Mahout could be the right place to contribute? or
> does it sound that it's outside Mahout's domain?
>
> On Mon, Sep 27, 2010 at 1:39 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Mahout's primary math support is inherited from Colt, but we are actively
> > deleting capabilities from Colt that we don't think
> > will contribute to scalable data mining goal because if we are going to
> use
> > any capability from Colt, we need to spend significant
> > effort to build tests for the code and we don't want to carry around a
> > bunch
> > of code that isn't useful.
> >
> >
>

Re: [math] autocorr

Posted by video axescon <vi...@axescon.com>.

Cholesky decomposition is used often for generating correlated random
sequences.

if I wanted to update OLS/GLM type of regressions and add Logit regression
estimation. do you think Mahout could be the right place to contribute? or
does it sound that it's outside Mahout's domain?

On Mon, Sep 27, 2010 at 1:39 PM, Ted Dunning <te...@gmail.com> wrote:

> Mahout's primary math support is inherited from Colt, but we are actively
> deleting capabilities from Colt that we don't think
> will contribute to scalable data mining goal because if we are going to use
> any capability from Colt, we need to spend significant
> effort to build tests for the code and we don't want to carry around a
> bunch
> of code that isn't useful.
>
>

Re: [math] autocorr

Posted by Ted Dunning <te...@gmail.com>.

Commons Math and Mahout are independent Apache projects with very different
goals and history.  Math has a
much broader goal for general math support while Mahout has a very focused
goal of building scalable data mining capabilities
quickly.  The fact that Mahout doesn't use math is unfortunate, but is
related to the difference in time scales related to the
goals.

Mahout's primary math support is inherited from Colt, but we are actively
deleting capabilities from Colt that we don't think
will contribute to scalable data mining goal because if we are going to use
any capability from Colt, we need to spend significant
effort to build tests for the code and we don't want to carry around a bunch
of code that isn't useful.

Specifically, while Mahout doesn't have Cholesky decomposition, it does have
QR decomposition which is generally just about as
useful.  We haven't yet ported LU decomposition because its utility for very
large systems which are commonly sparse is dubious.

On Mon, Sep 27, 2010 at 10:24 AM, video axescon <vi...@axescon.com> wrote:

> Thank you for clarification. I have to think a little of what to do now.
>
> The thing's that you seem to cherry pick components into both commons-math
> and Mahout, instead of bulk porting. For instance, I found
> autoCorrelation(...) in Descriptive in Mahout, but not in commons-math. At
> the same time, there's no cholesky decomposition in Mahout, and it's in
> commons-math. This is a bit frustrating to me.
>
> On Mon, Sep 27, 2010 at 12:29 PM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > In general, commons math *is* the better choice for general mathematical
> > computing.  Their mission is to provide a general mathematical substrate.
> >
> > Apache Mahout's mission is to provide scalable data mining.  Part of that
> > requires basic math which we took from Colt rather than from commons math
> > due to the compatibility constraints that commons math has.
> >
> > So, if implementing autocorr on top of Commons Math is good for you, that
> > sounds like an excellent option (it is just a dot product with an offset,
> > after all).
> >
> > IF that starts to require something that Commons Math can't easily
> provide,
> > Apache Mahout's math library (which is a separate jar, btw) may be better
> > since we are a bit more agile.   If your time series work starts to
> involve
> > serious scaling pains, then Mahout may be a good substrate from that
> > standpoint as well.
> >
> > On Mon, Sep 27, 2010 at 8:15 AM, video axescon <vi...@axescon.com>
> wrote:
> >
> > > Hello
> > >
> > > I'm a little confused now. I want to work on time series analysis,
> stuff
> > > like GARCH or VAR. Are you suggesting that Mahout can be the proper
> home
> > > for
> > > time series code? I guess it doesn't matter which library to start with
> > as
> > > long as it has good basic stats, optimization and matrix code in it to
> > > start
> > > with. Commons math seemed to be more logical choice to me.
> > >
> > > cheers
> > >
> > >
> > > On Mon, Sep 27, 2010 at 11:04 AM, Ted Dunning <te...@gmail.com>
> > > wrote:
> > >
> > > > Commons math has a strict backwards compatibility constraint.
> > > >
> > > > Apache Mahout does not.
> > > >
> > > > For fixed lag, it should only require a few lines of code in Mahout
> and
> > > you
> > > > should be up and running in a week or so on the trunk version.
> > > >
> > > > On Mon, Sep 27, 2010 at 7:47 AM, video axescon <vi...@axescon.com>
> > > wrote:
> > > >
> > > > > If you have a need for autocorrelation and would like to work with
> us
> > > to
> > > > > > rehabilitate and port the associated Colt code, I would
> > > > > > be happy to help by advising about our nascent conventions about
> > how
> > > we
> > > > > are
> > > > > > organizing our code and what sort of testing and
> > > > > > porting is needed.
> > > > > >
> > > > > >
> > > > > I'm contemplating it. I'm a little bit concerned about the
> > bureaucracy
> > > in
> > > > > this project, it could be easier for me to simply implement it for
> > > > myself.
> > > >
> > >
> >
>

Re: [math] autocorr

Posted by video axescon <vi...@axescon.com>.

Thank you for clarification. I have to think a little of what to do now.

The thing's that you seem to cherry pick components into both commons-math
and Mahout, instead of bulk porting. For instance, I found
autoCorrelation(...) in Descriptive in Mahout, but not in commons-math. At
the same time, there's no cholesky decomposition in Mahout, and it's in
commons-math. This is a bit frustrating to me.

On Mon, Sep 27, 2010 at 12:29 PM, Ted Dunning <te...@gmail.com> wrote:

> In general, commons math *is* the better choice for general mathematical
> computing.  Their mission is to provide a general mathematical substrate.
>
> Apache Mahout's mission is to provide scalable data mining.  Part of that
> requires basic math which we took from Colt rather than from commons math
> due to the compatibility constraints that commons math has.
>
> So, if implementing autocorr on top of Commons Math is good for you, that
> sounds like an excellent option (it is just a dot product with an offset,
> after all).
>
> IF that starts to require something that Commons Math can't easily provide,
> Apache Mahout's math library (which is a separate jar, btw) may be better
> since we are a bit more agile.   If your time series work starts to involve
> serious scaling pains, then Mahout may be a good substrate from that
> standpoint as well.
>
> On Mon, Sep 27, 2010 at 8:15 AM, video axescon <vi...@axescon.com> wrote:
>
> > Hello
> >
> > I'm a little confused now. I want to work on time series analysis, stuff
> > like GARCH or VAR. Are you suggesting that Mahout can be the proper home
> > for
> > time series code? I guess it doesn't matter which library to start with
> as
> > long as it has good basic stats, optimization and matrix code in it to
> > start
> > with. Commons math seemed to be more logical choice to me.
> >
> > cheers
> >
> >
> > On Mon, Sep 27, 2010 at 11:04 AM, Ted Dunning <te...@gmail.com>
> > wrote:
> >
> > > Commons math has a strict backwards compatibility constraint.
> > >
> > > Apache Mahout does not.
> > >
> > > For fixed lag, it should only require a few lines of code in Mahout and
> > you
> > > should be up and running in a week or so on the trunk version.
> > >
> > > On Mon, Sep 27, 2010 at 7:47 AM, video axescon <vi...@axescon.com>
> > wrote:
> > >
> > > > If you have a need for autocorrelation and would like to work with us
> > to
> > > > > rehabilitate and port the associated Colt code, I would
> > > > > be happy to help by advising about our nascent conventions about
> how
> > we
> > > > are
> > > > > organizing our code and what sort of testing and
> > > > > porting is needed.
> > > > >
> > > > >
> > > > I'm contemplating it. I'm a little bit concerned about the
> bureaucracy
> > in
> > > > this project, it could be easier for me to simply implement it for
> > > myself.
> > >
> >
>

Re: [math] autocorr

Posted by Ted Dunning <te...@gmail.com>.

In general, commons math *is* the better choice for general mathematical
computing.  Their mission is to provide a general mathematical substrate.

Apache Mahout's mission is to provide scalable data mining.  Part of that
requires basic math which we took from Colt rather than from commons math
due to the compatibility constraints that commons math has.

So, if implementing autocorr on top of Commons Math is good for you, that
sounds like an excellent option (it is just a dot product with an offset,
after all).

IF that starts to require something that Commons Math can't easily provide,
Apache Mahout's math library (which is a separate jar, btw) may be better
since we are a bit more agile.   If your time series work starts to involve
serious scaling pains, then Mahout may be a good substrate from that
standpoint as well.

On Mon, Sep 27, 2010 at 8:15 AM, video axescon <vi...@axescon.com> wrote:

> Hello
>
> I'm a little confused now. I want to work on time series analysis, stuff
> like GARCH or VAR. Are you suggesting that Mahout can be the proper home
> for
> time series code? I guess it doesn't matter which library to start with as
> long as it has good basic stats, optimization and matrix code in it to
> start
> with. Commons math seemed to be more logical choice to me.
>
> cheers
>
>
> On Mon, Sep 27, 2010 at 11:04 AM, Ted Dunning <te...@gmail.com>
> wrote:
>
> > Commons math has a strict backwards compatibility constraint.
> >
> > Apache Mahout does not.
> >
> > For fixed lag, it should only require a few lines of code in Mahout and
> you
> > should be up and running in a week or so on the trunk version.
> >
> > On Mon, Sep 27, 2010 at 7:47 AM, video axescon <vi...@axescon.com>
> wrote:
> >
> > > If you have a need for autocorrelation and would like to work with us
> to
> > > > rehabilitate and port the associated Colt code, I would
> > > > be happy to help by advising about our nascent conventions about how
> we
> > > are
> > > > organizing our code and what sort of testing and
> > > > porting is needed.
> > > >
> > > >
> > > I'm contemplating it. I'm a little bit concerned about the bureaucracy
> in
> > > this project, it could be easier for me to simply implement it for
> > myself.
> >
>

Re: [math] autocorr

Posted by video axescon <vi...@axescon.com>.

Hello

I'm a little confused now. I want to work on time series analysis, stuff
like GARCH or VAR. Are you suggesting that Mahout can be the proper home for
time series code? I guess it doesn't matter which library to start with as
long as it has good basic stats, optimization and matrix code in it to start
with. Commons math seemed to be more logical choice to me.

cheers

On Mon, Sep 27, 2010 at 11:04 AM, Ted Dunning <te...@gmail.com> wrote:

> Commons math has a strict backwards compatibility constraint.
>
> Apache Mahout does not.
>
> For fixed lag, it should only require a few lines of code in Mahout and you
> should be up and running in a week or so on the trunk version.
>
> On Mon, Sep 27, 2010 at 7:47 AM, video axescon <vi...@axescon.com> wrote:
>
> > If you have a need for autocorrelation and would like to work with us to
> > > rehabilitate and port the associated Colt code, I would
> > > be happy to help by advising about our nascent conventions about how we
> > are
> > > organizing our code and what sort of testing and
> > > porting is needed.
> > >
> > >
> > I'm contemplating it. I'm a little bit concerned about the bureaucracy in
> > this project, it could be easier for me to simply implement it for
> myself.
>

Re: [math] autocorr

Posted by Ted Dunning <te...@gmail.com>.

Commons math has a strict backwards compatibility constraint.

Apache Mahout does not.

For fixed lag, it should only require a few lines of code in Mahout and you
should be up and running in a week or so on the trunk version.

On Mon, Sep 27, 2010 at 7:47 AM, video axescon <vi...@axescon.com> wrote:

> If you have a need for autocorrelation and would like to work with us to
> > rehabilitate and port the associated Colt code, I would
> > be happy to help by advising about our nascent conventions about how we
> are
> > organizing our code and what sort of testing and
> > porting is needed.
> >
> >
> I'm contemplating it. I'm a little bit concerned about the bureaucracy in
> this project, it could be easier for me to simply implement it for myself.

Re: [math] autocorr

Posted by video axescon <vi...@axescon.com>.

Hello

On Mon, Sep 27, 2010 at 12:10 AM, Ted Dunning <te...@gmail.com> wrote:

> Everything in Colt was untested and as a result there were bugs and
> inconsistencies.
>
> As part of the Mahout project


I havent heard of this project yet, will check it out.


> Since autocorrelation depends on FFT's to compute and because it would be
> quite a bit of work to implement good tests FFT's, I
> think that is still the right decision for us to have made.
>
>
it doesn't have to be FFT to compute simple autocorrelation for a given lag,
or first 20 lags like in SAS. I'm not sure if Colt was using FFT here
http://acs.lbl.gov/software/colt/api/cern/jet/stat/Descriptive.html#autoCorrelation(cern.colt.list.DoubleArrayList,
int, double, double)

maybe I should have been more clear, there's autocorr function in matlab.
This thing is a full ACF analysis with graphs. It's also required, but not
what I meant in my first email.


If you have a need for autocorrelation and would like to work with us to
> rehabilitate and port the associated Colt code, I would
> be happy to help by advising about our nascent conventions about how we are
> organizing our code and what sort of testing and
> porting is needed.
>
>
I'm contemplating it. I'm a little bit concerned about the bureaucracy in
this project, it could be easier for me to simply implement it for myself.

cheers

Re: [math] autocorr

Posted by Phil Steitz <ph...@gmail.com>.

On 9/27/10 4:17 AM, luc.maisonobe@free.fr wrote:
>
> ----- "Ted Dunning"<te...@gmail.com>  a écrit :
>

> I'm not sure anymore about which project the original question addressed.
> Was it for Mahout or Commons-math ?
>

Well, this is the Commons users list, and [math] appears in the 
subject line, so it is a safe bet that the question is about Commons 
Math.  The answer is that we have not gotten around to implementing 
autocorrelation.  I would be happy to review and apply patches to 
add this feature.  You can add it to the math wish list here:
http://wiki.apache.org/commons/MathWishList

or start a discussion on the Commons Developers list, create a JIRA 
and attach a patch.  See
http://commons.apache.org/patches.html

Thanks!

Phil

> Luc
>

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] autocorr

Posted by lu...@free.fr.

----- "Ted Dunning" <te...@gmail.com> a écrit :

> Everything in Colt was untested and as a result there were bugs and
> inconsistencies.
> 
> As part of the Mahout project, we have redefined the matrix primitives
> to be
> more amenable to our needs and simpler to extend than the original
> Colt
> arrays.  We have been busily either testing and converting code that
> we
> needed to build a scalable machine learning code and are
> simultaneously
> deleting pretty much everything that doesn't contribute to that goal.

I'm not sure anymore about which project the original question addressed.
Was it for Mahout or Commons-math ?

Luc

> 
> Some things are still around and some have already been deleted.  I
> believe
> that I was the one who recently deleted the Descriptive statistics
> back
> based largely on the fact that it wasn't very compatible with the rest
> of
> Colt (it uses DoubleArrayList instead of DoubleArray1D, for instance)
> and
> because most of the functions are relatively trivial.
> 
> Since autocorrelation depends on FFT's to compute and because it would
> be
> quite a bit of work to implement good tests FFT's, I
> think that is still the right decision for us to have made.
> 
> If you have a need for autocorrelation and would like to work with us
> to
> rehabilitate and port the associated Colt code, I would
> be happy to help by advising about our nascent conventions about how
> we are
> organizing our code and what sort of testing and
> porting is needed.
> 
> 
> On Sun, Sep 26, 2010 at 8:36 PM, video axescon <vi...@axescon.com>
> wrote:
> 
> > Hello
> >
> > I cant find autocorrelation function in stats package. Is there a
> reason
> > why
> > it wasn't implemented?
> > It exists in Colt project.
> >
> > cheers
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@commons.apache.org
For additional commands, e-mail: user-help@commons.apache.org

Re: [math] autocorr

Posted by Ted Dunning <te...@gmail.com>.

Everything in Colt was untested and as a result there were bugs and
inconsistencies.

As part of the Mahout project, we have redefined the matrix primitives to be
more amenable to our needs and simpler to extend than the original Colt
arrays.  We have been busily either testing and converting code that we
needed to build a scalable machine learning code and are simultaneously
deleting pretty much everything that doesn't contribute to that goal.

Some things are still around and some have already been deleted.  I believe
that I was the one who recently deleted the Descriptive statistics back
based largely on the fact that it wasn't very compatible with the rest of
Colt (it uses DoubleArrayList instead of DoubleArray1D, for instance) and
because most of the functions are relatively trivial.

Since autocorrelation depends on FFT's to compute and because it would be
quite a bit of work to implement good tests FFT's, I
think that is still the right decision for us to have made.

If you have a need for autocorrelation and would like to work with us to
rehabilitate and port the associated Colt code, I would
be happy to help by advising about our nascent conventions about how we are
organizing our code and what sort of testing and
porting is needed.

On Sun, Sep 26, 2010 at 8:36 PM, video axescon <vi...@axescon.com> wrote:

> Hello
>
> I cant find autocorrelation function in stats package. Is there a reason
> why
> it wasn't implemented?
> It exists in Colt project.
>
> cheers
>